Intrinsic Bias Metrics Do Not Correlate with Application Bias
- URL: http://arxiv.org/abs/2012.15859v2
- Date: Sat, 2 Jan 2021 11:41:05 GMT
- Title: Intrinsic Bias Metrics Do Not Correlate with Application Bias
- Authors: Seraphina Goldfarb-Tarrant, Rebecca Marchant, Ricardo Mu\~noz Sanchez,
Mugdha Pandya, Adam Lopez
- Abstract summary: This research examines whether easy-to-measure intrinsic metrics correlate well to real world extrinsic metrics.
We measure both intrinsic and extrinsic bias across hundreds of trained models covering different tasks and experimental conditions.
We advise that efforts to debias embedding spaces be always also paired with measurement of downstream model bias, and suggest that that community increase effort into making downstream measurement more feasible via creation of additional challenge sets and annotated test data.
- Score: 12.588713044749179
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural Language Processing (NLP) systems learn harmful societal biases that
cause them to widely proliferate inequality as they are deployed in more and
more situations. To address and combat this, the NLP community relies on a
variety of metrics to identify and quantify bias in black-box models and to
guide efforts at debiasing. Some of these metrics are intrinsic, and are
measured in word embedding spaces, and some are extrinsic, which measure the
bias present downstream in the tasks that the word embeddings are plugged into.
This research examines whether easy-to-measure intrinsic metrics correlate well
to real world extrinsic metrics. We measure both intrinsic and extrinsic bias
across hundreds of trained models covering different tasks and experimental
conditions and find that there is no reliable correlation between these metrics
that holds in all scenarios across tasks and languages. We advise that efforts
to debias embedding spaces be always also paired with measurement of downstream
model bias, and suggest that that community increase effort into making
downstream measurement more feasible via creation of additional challenge sets
and annotated test data. We additionally release code, a new intrinsic metric,
and an annotated test set for gender bias for hatespeech.
Related papers
- Analyzing Correlations Between Intrinsic and Extrinsic Bias Metrics of Static Word Embeddings With Their Measuring Biases Aligned [8.673018064714547]
We examine the abilities of intrinsic bias metrics of static word embeddings to predict whether Natural Language Processing (NLP) systems exhibit biased behavior.
A word embedding is one of the fundamental NLP technologies that represents the meanings of words through real vectors, and problematically, it also learns social biases such as stereotypes.
arXiv Detail & Related papers (2024-09-14T02:13:56Z) - Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - This Prompt is Measuring <MASK>: Evaluating Bias Evaluation in Language
Models [12.214260053244871]
We analyse the body of work that uses prompts and templates to assess bias in language models.
We draw on a measurement modelling framework to create a taxonomy of attributes that capture what a bias test aims to measure.
Our analysis illuminates the scope of possible bias types the field is able to measure, and reveals types that are as yet under-researched.
arXiv Detail & Related papers (2023-05-22T06:28:48Z) - Choose Your Lenses: Flaws in Gender Bias Evaluation [29.16221451643288]
We assess the current paradigm of gender bias evaluation and identify several flaws in it.
First, we highlight the importance of extrinsic bias metrics that measure how a model's performance on some task is affected by gender.
Second, we find that datasets and metrics are often coupled, and discuss how their coupling hinders the ability to obtain reliable conclusions.
arXiv Detail & Related papers (2022-10-20T17:59:55Z) - How Gender Debiasing Affects Internal Model Representations, and Why It
Matters [26.993273464725995]
We show that intrinsic bias is better indicator of debiasing than the standard WEAT metric.
Our framework provides a comprehensive perspective on bias in NLP models, which can be applied to deploy NLP systems in a more informed manner.
arXiv Detail & Related papers (2022-04-14T08:54:15Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - On the Intrinsic and Extrinsic Fairness Evaluation Metrics for
Contextualized Language Representations [74.70957445600936]
Multiple metrics have been introduced to measure fairness in various natural language processing tasks.
These metrics can be roughly categorized into two categories: 1) emphextrinsic metrics for evaluating fairness in downstream applications and 2) emphintrinsic metrics for estimating fairness in upstream language representation models.
arXiv Detail & Related papers (2022-03-25T22:17:43Z) - Information-Theoretic Bias Reduction via Causal View of Spurious
Correlation [71.9123886505321]
We propose an information-theoretic bias measurement technique through a causal interpretation of spurious correlation.
We present a novel debiasing framework against the algorithmic bias, which incorporates a bias regularization loss.
The proposed bias measurement and debiasing approaches are validated in diverse realistic scenarios.
arXiv Detail & Related papers (2022-01-10T01:19:31Z) - Evaluating Metrics for Bias in Word Embeddings [44.14639209617701]
We formalize a bias definition based on the ideas from previous works and derive conditions for bias metrics.
We propose a new metric, SAME, to address the shortcomings of existing metrics and mathematically prove that SAME behaves appropriately.
arXiv Detail & Related papers (2021-11-15T16:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.