Language (Technology) is Power: A Critical Survey of "Bias" in NLP
- URL: http://arxiv.org/abs/2005.14050v2
- Date: Fri, 29 May 2020 16:44:18 GMT
- Title: Language (Technology) is Power: A Critical Survey of "Bias" in NLP
- Authors: Su Lin Blodgett and Solon Barocas and Hal Daum\'e III and Hanna
Wallach
- Abstract summary: We survey 146 papers analyzing "bias" in NLP systems.
We find that their motivations are vague, inconsistent, and lacking in normative reasoning.
We propose three recommendations that should guide work analyzing "bias" in NLP systems.
- Score: 11.221552724154986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We survey 146 papers analyzing "bias" in NLP systems, finding that their
motivations are often vague, inconsistent, and lacking in normative reasoning,
despite the fact that analyzing "bias" is an inherently normative process. We
further find that these papers' proposed quantitative techniques for measuring
or mitigating "bias" are poorly matched to their motivations and do not engage
with the relevant literature outside of NLP. Based on these findings, we
describe the beginnings of a path forward by proposing three recommendations
that should guide work analyzing "bias" in NLP systems. These recommendations
rest on a greater recognition of the relationships between language and social
hierarchies, encouraging researchers and practitioners to articulate their
conceptualizations of "bias"---i.e., what kinds of system behaviors are
harmful, in what ways, to whom, and why, as well as the normative reasoning
underlying these statements---and to center work around the lived experiences
of members of communities affected by NLP systems, while interrogating and
reimagining the power relations between technologists and such communities.
Related papers
- Semantic Properties of cosine based bias scores for word embeddings [48.0753688775574]
We propose requirements for bias scores to be considered meaningful for quantifying biases.
We analyze cosine based scores from the literature with regard to these requirements.
We underline these findings with experiments to show that the bias scores' limitations have an impact in the application case.
arXiv Detail & Related papers (2024-01-27T20:31:10Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Fair Enough: Standardizing Evaluation and Model Selection for Fairness
Research in NLP [64.45845091719002]
Modern NLP systems exhibit a range of biases, which a growing literature on model debiasing attempts to correct.
This paper seeks to clarify the current situation and plot a course for meaningful progress in fair learning.
arXiv Detail & Related papers (2023-02-11T14:54:00Z) - Undesirable Biases in NLP: Addressing Challenges of Measurement [1.7126708168238125]
We provide an interdisciplinary approach to discussing the issue of NLP model bias by adopting the lens of psychometrics.
We will explore two central notions from psychometrics, the construct validity and the reliability of measurement tools.
Our goal is to provide NLP practitioners with methodological tools for designing better bias measures.
arXiv Detail & Related papers (2022-11-24T16:53:18Z) - Toward Understanding Bias Correlations for Mitigation in NLP [34.956581421295]
This work aims to provide a first systematic study toward understanding bias correlations in mitigation.
We examine bias mitigation in two common NLP tasks -- toxicity detection and word embeddings.
Our findings suggest that biases are correlated and present scenarios in which independent debiasing approaches may be insufficient.
arXiv Detail & Related papers (2022-05-24T22:48:47Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - A Survey on Bias and Fairness in Natural Language Processing [1.713291434132985]
We analyze the origins of biases, the definitions of fairness, and how different subfields of NLP bias can be mitigated.
We discuss how future studies can work towards eradicating pernicious biases from NLP algorithms.
arXiv Detail & Related papers (2022-03-06T18:12:30Z) - Situated Data, Situated Systems: A Methodology to Engage with Power
Relations in Natural Language Processing Research [18.424211072825308]
We propose a bias-aware methodology to engage with power relations in natural language processing (NLP) research.
After an extensive and interdisciplinary literature review, we contribute a bias-aware methodology for NLP research.
arXiv Detail & Related papers (2020-11-11T17:04:55Z) - Case Study: Deontological Ethics in NLP [119.53038547411062]
We study one ethical theory, namely deontological ethics, from the perspective of NLP.
In particular, we focus on the generalization principle and the respect for autonomy through informed consent.
We provide four case studies to demonstrate how these principles can be used with NLP systems.
arXiv Detail & Related papers (2020-10-09T16:04:51Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.