Societal Biases in Retrieved Contents: Measurement Framework and
Adversarial Mitigation for BERT Rankers
- URL: http://arxiv.org/abs/2104.13640v1
- Date: Wed, 28 Apr 2021 08:53:54 GMT
- Title: Societal Biases in Retrieved Contents: Measurement Framework and
Adversarial Mitigation for BERT Rankers
- Authors: Navid Rekabsaz and Simone Kopeinik and Markus Schedl
- Abstract summary: We provide a novel framework to measure the fairness in the retrieved text contents of ranking models.
We propose an adversarial bias mitigation approach applied to the state-of-the-art Bert rankers.
Our results on the MS MARCO benchmark show that, while the fairness of all ranking models is lower than the ones of ranker-agnostic baselines, the fairness in retrieved contents significantly improves when applying the proposed adversarial training.
- Score: 9.811131801693856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Societal biases resonate in the retrieved contents of information retrieval
(IR) systems, resulting in reinforcing existing stereotypes. Approaching this
issue requires established measures of fairness regarding the representation of
various social groups in retrieved contents, as well as methods to mitigate
such biases, particularly in the light of the advances in deep ranking models.
In this work, we first provide a novel framework to measure the fairness in the
retrieved text contents of ranking models. Introducing a ranker-agnostic
measurement, the framework also enables the disentanglement of the effect on
fairness of collection from that of rankers. Second, we propose an adversarial
bias mitigation approach applied to the state-of-the-art Bert rankers, which
jointly learns to predict relevance and remove protected attributes. We conduct
experiments on two passage retrieval collections (MS MARCO Passage Re-ranking
and TREC Deep Learning 2019 Passage Re-ranking), which we extend by fairness
annotations of a selected subset of queries regarding gender attributes. Our
results on the MS MARCO benchmark show that, while the fairness of all ranking
models is lower than the ones of ranker-agnostic baselines, the fairness in
retrieved contents significantly improves when applying the proposed
adversarial training. Lastly, we investigate the trade-off between fairness and
utility, showing that through applying a combinatorial model selection method,
we can maintain the significant improvements in fairness without any
significant loss in utility.
Related papers
- A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System [9.470545149911072]
This paper proposes a normative framework to benchmark consumer fairness in LLM-powered recommender systems.
We argue that this gap can lead to arbitrary conclusions about fairness.
Experiments on the MovieLens dataset on consumer fairness reveal fairness deviations in age-based recommendations.
arXiv Detail & Related papers (2024-05-03T16:25:27Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - GaussianMLR: Learning Implicit Class Significance via Calibrated
Multi-Label Ranking [0.0]
We propose a novel multi-label ranking method: GaussianMLR.
It aims to learn implicit class significance values that determine the positive label ranks.
We show that our method is able to accurately learn a representation of the incorporated positive rank order.
arXiv Detail & Related papers (2023-03-07T14:09:08Z) - Conditional Supervised Contrastive Learning for Fair Text Classification [59.813422435604025]
We study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning.
Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives.
arXiv Detail & Related papers (2022-05-23T17:38:30Z) - Debiasing Neural Retrieval via In-batch Balancing Regularization [25.941718123899356]
We develop a differentiable textitnormed Pairwise Ranking Fairness (nPRF) and leverage the T-statistics on top of nPRF to improve fairness.
Our method with nPRF achieves significantly less bias with minimal degradation in ranking performance compared with the baseline.
arXiv Detail & Related papers (2022-05-18T22:57:15Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Fair Tree Learning [0.15229257192293202]
Various optimisation criteria combine classification performance with a fairness metric.
Current fair decision tree methods only optimise for a fixed threshold on both the classification task as well as the fairness metric.
We propose a threshold-independent fairness metric termed uniform demographic parity, and a derived splitting criterion entitled SCAFF -- Splitting Criterion AUC for Fairness.
arXiv Detail & Related papers (2021-10-18T13:40:25Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.