Towards Debiasing Sentence Representations
- URL: http://arxiv.org/abs/2007.08100v1
- Date: Thu, 16 Jul 2020 04:22:30 GMT
- Title: Towards Debiasing Sentence Representations
- Authors: Paul Pu Liang, Irene Mengze Li, Emily Zheng, Yao Chong Lim, Ruslan
Salakhutdinov, Louis-Philippe Morency
- Abstract summary: We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
- Score: 109.70181221796469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As natural language processing methods are increasingly deployed in
real-world scenarios such as healthcare, legal systems, and social science, it
becomes necessary to recognize the role they potentially play in shaping social
biases and stereotypes. Previous work has revealed the presence of social
biases in widely used word embeddings involving gender, race, religion, and
other social constructs. While some methods were proposed to debias these
word-level embeddings, there is a need to perform debiasing at the
sentence-level given the recent shift towards new contextualized sentence
representations such as ELMo and BERT. In this paper, we investigate the
presence of social biases in sentence-level representations and propose a new
method, Sent-Debias, to reduce these biases. We show that Sent-Debias is
effective in removing biases, and at the same time, preserves performance on
sentence-level downstream tasks such as sentiment analysis, linguistic
acceptability, and natural language understanding. We hope that our work will
inspire future research on characterizing and removing social biases from
widely adopted sentence representations for fairer NLP.
Related papers
- The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency.
We introduce the novel Language Agency Bias Evaluation benchmark.
We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - No Word Embedding Model Is Perfect: Evaluating the Representation
Accuracy for Social Bias in the Media [17.4812995898078]
We study what kind of embedding algorithm serves best to accurately measure types of social bias known to exist in US online news articles.
We collect 500k articles and review psychology literature with respect to expected social bias.
We compare how models trained with the algorithms on news articles represent the expected social bias.
arXiv Detail & Related papers (2022-11-07T15:45:52Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Sense Embeddings are also Biased--Evaluating Social Biases in Static and
Contextualised Sense Embeddings [28.062567781403274]
One sense of an ambiguous word might be socially biased while its other senses remain unbiased.
We create a benchmark dataset for evaluating the social biases in sense embeddings.
We propose novel sense-specific bias evaluation measures.
arXiv Detail & Related papers (2022-03-14T22:08:37Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - "Thy algorithm shalt not bear false witness": An Evaluation of
Multiclass Debiasing Methods on Word Embeddings [3.0204693431381515]
The paper investigates the state-of-the-art multiclass debiasing techniques: Hard debiasing, SoftWEAT debiasing and Conceptor debiasing.
It evaluates their performance when removing religious bias on a common basis by quantifying bias removal via the Word Embedding Association Test (WEAT), Mean Average Cosine Similarity (MAC) and the Relative Negative Sentiment Bias (RNSB)
arXiv Detail & Related papers (2020-10-30T12:49:39Z) - Discovering and Interpreting Biased Concepts in Online Communities [5.670038395203354]
Language carries implicit human biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them.
ML-based NLP methods such as word embeddings have been shown to learn such language biases with striking accuracy.
This paper improves upon, extends, and evaluates our previous data-driven method to automatically discover and help interpret biased concepts encoded in word embeddings.
arXiv Detail & Related papers (2020-10-27T17:07:12Z) - Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups.
We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.