Word embeddings are biased. But whose bias are they reflecting?

Davor Petreski; Ibrahim C. Hashim

Download from

dx.doi.org

More download options

Word embeddings are biased. But whose bias are they reflecting?

Davor Petreski & Ibrahim C. Hashim

AI and Society 38 (2):975-982 (2023) Copy BIBT_EX

Abstract

From Curriculum Vitae parsing to web search and recommendation systems, Word2Vec and other word embedding techniques have an increasing presence in everyday interactions in human society. Biases, such as gender bias, have been thoroughly researched and evidenced to be present in word embeddings. Most of the research focuses on discovering and mitigating gender bias within the frames of the vector space itself. Nevertheless, whose bias is reflected in word embeddings has not yet been investigated. Besides discovering and mitigating gender bias, it is also important to examine whether a feminine or a masculine-centric view is represented in the biases of word embeddings. This way, we will not only gain more insight into the origins of the before mentioned biases, but also present a novel approach to investigating biases in Natural Language Processing systems. Based on previous research in the social sciences and gender studies, we hypothesize that masculine-centric, otherwise known as androcentric, biases are dominant in word embeddings. To test this hypothesis we used the largest English word association test data set publicly available. We compare the distance of the responses of male and female participants to cue words in a word embedding vector space. We found that the word embedding is biased towards a masculine-centric viewpoint, predominantly reflecting the worldviews of the male participants in the word association test data set. Therefore, by conducting this research, we aimed to unravel another layer of bias to be considered when examining fairness in algorithms.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Edit

Keywords

Artificial Intelligence Computer Science, general Control, Robotics, Mechatronics Engineering Economics, Organization, Logistics, Marketing Methodology of the Social Sciences Performing Arts

Reprint years

DOI

10.1007/s00146-022-01443-w

My notes

Analytics

Added to PP
2023-05-02

Downloads
11 (#1,141,924)

6 months
3 (#983,674)

Historical graph of downloads

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

The Large‐Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth.Mark Steyvers & Joshua B. Tenenbaum - 2005 - Cognitive Science 29 (1):41-78.

Sketches from a Design Process: Creative Cognition Inferred From Intermediate Products.Robert L. Goldstone, Steven A. Sloman, David A. Lagnado, Mark Steyvers, Joshua B. Tenenbaum, Saskia Jaarsveld, Cees van Leeuwen, Murray Shanahan, Terry Dartnall & Simon Dennis - 2005 - Cognitive Science 29 (1):79-101.

The Large‐Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth.Mark Steyvers & Joshua B. Tenenbaum - 2005 - Cognitive Science 29 (1):41-78.

The Ugly Truth About Ourselves and Our Robot Creations: The Problem of Bias and Social Inequity.Ayanna Howard & Jason Borenstein - 2018 - Science and Engineering Ethics 24 (5):1521-1536.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Word embeddings are biased. But whose bias are they reflecting?

Abstract

Categories

Keywords

Reprint years

DOI

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work