What do Bias Measures Measure?
- URL: http://arxiv.org/abs/2108.03362v1
- Date: Sat, 7 Aug 2021 04:08:47 GMT
- Title: What do Bias Measures Measure?
- Authors: Sunipa Dev, Emily Sheng, Jieyu Zhao, Jiao Sun, Yu Hou, Mattie
Sanseverino, Jiin Kim, Nanyun Peng, Kai-Wei Chang
- Abstract summary: Natural Language Processing models propagate social biases about protected attributes such as gender, race, and nationality.
To create interventions and mitigate these biases and associated harms, it is vital to be able to detect and measure such biases.
This work presents a comprehensive survey of existing bias measures in NLP as a function of the associated NLP tasks, metrics, datasets, and social biases and corresponding harms.
- Score: 41.36968251743058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural Language Processing (NLP) models propagate social biases about
protected attributes such as gender, race, and nationality. To create
interventions and mitigate these biases and associated harms, it is vital to be
able to detect and measure such biases. While many existing works propose bias
evaluation methodologies for different tasks, there remains a need to
cohesively understand what biases and normative harms each of these measures
captures and how different measures compare. To address this gap, this work
presents a comprehensive survey of existing bias measures in NLP as a function
of the associated NLP tasks, metrics, datasets, and social biases and
corresponding harms. This survey also organizes metrics into different
categories to present advantages and disadvantages. Finally, we propose a
documentation standard for bias measures to aid their development,
categorization, and appropriate usage.
Related papers
- Comprehensive Equity Index (CEI): Definition and Application to Bias Evaluation in Biometrics [47.762333925222926]
We present a novel metric to quantify biased behaviors of machine learning models.
We focus on and apply it to the operational evaluation of face recognition systems.
arXiv Detail & Related papers (2024-09-03T14:19:38Z) - A Principled Approach for a New Bias Measure [7.352247786388098]
We propose the definition of Uniform Bias (UB), the first bias measure with a clear and simple interpretation in the full range of bias values.
Our results are experimentally validated using nine publicly available datasets and theoretically analyzed, which provide novel insights about the problem.
Based on our approach, we also design a bias mitigation model that might be useful to policymakers.
arXiv Detail & Related papers (2024-05-20T18:14:33Z) - The Impact of Differential Feature Under-reporting on Algorithmic Fairness [86.275300739926]
We present an analytically tractable model of differential feature under-reporting.
We then use to characterize the impact of this kind of data bias on algorithmic fairness.
Our results show that, in real world data settings, under-reporting typically leads to increasing disparities.
arXiv Detail & Related papers (2024-01-16T19:16:22Z) - This Prompt is Measuring <MASK>: Evaluating Bias Evaluation in Language
Models [12.214260053244871]
We analyse the body of work that uses prompts and templates to assess bias in language models.
We draw on a measurement modelling framework to create a taxonomy of attributes that capture what a bias test aims to measure.
Our analysis illuminates the scope of possible bias types the field is able to measure, and reveals types that are as yet under-researched.
arXiv Detail & Related papers (2023-05-22T06:28:48Z) - Fair Enough: Standardizing Evaluation and Model Selection for Fairness
Research in NLP [64.45845091719002]
Modern NLP systems exhibit a range of biases, which a growing literature on model debiasing attempts to correct.
This paper seeks to clarify the current situation and plot a course for meaningful progress in fair learning.
arXiv Detail & Related papers (2023-02-11T14:54:00Z) - Trustworthy Social Bias Measurement [92.87080873893618]
In this work, we design bias measures that warrant trust based on the cross-disciplinary theory of measurement modeling.
We operationalize our definition by proposing a general bias measurement framework DivDist, which we use to instantiate 5 concrete bias measures.
We demonstrate considerable evidence to trust our measures, showing they overcome conceptual, technical, and empirical deficiencies present in prior measures.
arXiv Detail & Related papers (2022-12-20T18:45:12Z) - Choose Your Lenses: Flaws in Gender Bias Evaluation [29.16221451643288]
We assess the current paradigm of gender bias evaluation and identify several flaws in it.
First, we highlight the importance of extrinsic bias metrics that measure how a model's performance on some task is affected by gender.
Second, we find that datasets and metrics are often coupled, and discuss how their coupling hinders the ability to obtain reliable conclusions.
arXiv Detail & Related papers (2022-10-20T17:59:55Z) - Debiasing isn't enough! -- On the Effectiveness of Debiasing MLMs and
their Social Biases in Downstream Tasks [33.044775876807826]
We study intrinsic relationship between task-agnostic and task-specific extrinsic social bias evaluation measures for Masked Language Models (MLMs)
We find that there exists only a weak correlation between these two types of evaluation measures.
arXiv Detail & Related papers (2022-10-06T14:08:57Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Intrinsic Bias Metrics Do Not Correlate with Application Bias [12.588713044749179]
This research examines whether easy-to-measure intrinsic metrics correlate well to real world extrinsic metrics.
We measure both intrinsic and extrinsic bias across hundreds of trained models covering different tasks and experimental conditions.
We advise that efforts to debias embedding spaces be always also paired with measurement of downstream model bias, and suggest that that community increase effort into making downstream measurement more feasible via creation of additional challenge sets and annotated test data.
arXiv Detail & Related papers (2020-12-31T18:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.