Discovering and Interpreting Biased Concepts in Online Communities
- URL: http://arxiv.org/abs/2010.14448v2
- Date: Mon, 24 Jan 2022 23:37:26 GMT
- Title: Discovering and Interpreting Biased Concepts in Online Communities
- Authors: Xavier Ferrer-Aran, Tom van Nuenen, Natalia Criado, Jose M. Such
- Abstract summary: Language carries implicit human biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them.
ML-based NLP methods such as word embeddings have been shown to learn such language biases with striking accuracy.
This paper improves upon, extends, and evaluates our previous data-driven method to automatically discover and help interpret biased concepts encoded in word embeddings.
- Score: 5.670038395203354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language carries implicit human biases, functioning both as a reflection and
a perpetuation of stereotypes that people carry with them. Recently, ML-based
NLP methods such as word embeddings have been shown to learn such language
biases with striking accuracy. This capability of word embeddings has been
successfully exploited as a tool to quantify and study human biases. However,
previous studies only consider a predefined set of biased concepts to attest
(e.g., whether gender is more or less associated with particular jobs), or just
discover biased words without helping to understand their meaning at the
conceptual level. As such, these approaches can be either unable to find biased
concepts that have not been defined in advance, or the biases they find are
difficult to interpret and study. This could make existing approaches
unsuitable to discover and interpret biases in online communities, as such
communities may carry different biases than those in mainstream culture. This
paper improves upon, extends, and evaluates our previous data-driven method to
automatically discover and help interpret biased concepts encoded in word
embeddings. We apply this approach to study the biased concepts present in the
language used in online communities and experimentally show the validity and
stability of our method
Related papers
- Semantic Properties of cosine based bias scores for word embeddings [52.13994416317707]
We propose requirements for bias scores to be considered meaningful for quantifying biases.
We analyze cosine based scores from the literature with regard to these requirements.
We underline these findings with experiments to show that the bias scores' limitations have an impact in the application case.
arXiv Detail & Related papers (2024-01-27T20:31:10Z) - Evaluating Biased Attitude Associations of Language Models in an
Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology.
We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight.
We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z) - An Analysis of Social Biases Present in BERT Variants Across Multiple
Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages.
We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood.
We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z) - The SAME score: Improved cosine based bias score for word embeddings [63.24247894974291]
We provide a bias definition based on the ideas from the literature and derive novel requirements for bias scores.
We propose a new bias score, SAME, to address the shortcomings of existing bias scores and show empirically that SAME is better suited to quantify biases in word embeddings.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Evaluating Metrics for Bias in Word Embeddings [64.55554083622258]
We formalize a bias definition based on the ideas from previous works and derive conditions for bias metrics.
We propose a new metric, SAME, to address the shortcomings of existing metrics and mathematically prove that SAME behaves appropriately.
arXiv Detail & Related papers (2021-11-15T16:07:15Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Robustness and Reliability of Gender Bias Assessment in Word Embeddings:
The Role of Base Pairs [23.574442657224008]
It has been shown that word embeddings can exhibit gender bias, and various methods have been proposed to quantify this.
Previous work has leveraged gender word pairs to measure bias and extract biased analogies.
We show that the reliance on these gendered pairs has strong limitations.
In particular, the well-known analogy "man is to computer-programmer as woman is to homemaker" is due to word similarity rather than societal bias.
arXiv Detail & Related papers (2020-10-06T16:09:05Z) - Discovering and Categorising Language Biases in Reddit [5.670038395203354]
This paper proposes a data-driven approach to automatically discover language biases encoded in the vocabulary of online discourse communities on Reddit.
We use word embeddings to transform text into high-dimensional dense vectors and capture semantic relations between words.
We successfully discover gender bias, religion bias, and ethnic bias in different Reddit communities.
arXiv Detail & Related papers (2020-08-06T16:42:10Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.