InterFair: Debiasing with Natural Language Feedback for Fair
Interpretable Predictions
- URL: http://arxiv.org/abs/2210.07440v2
- Date: Mon, 23 Oct 2023 10:35:14 GMT
- Title: InterFair: Debiasing with Natural Language Feedback for Fair
Interpretable Predictions
- Authors: Bodhisattwa Prasad Majumder, Zexue He, Julian McAuley
- Abstract summary: We argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it.
We explore two interactive setups with a frozen predictive model and show that users able to provide feedback can achieve a better and fairer balance between task performance and bias mitigation.
- Score: 30.246111670391375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Debiasing methods in NLP models traditionally focus on isolating information
related to a sensitive attribute (e.g., gender or race). We instead argue that
a favorable debiasing method should use sensitive information 'fairly,' with
explanations, rather than blindly eliminating it. This fair balance is often
subjective and can be challenging to achieve algorithmically. We explore two
interactive setups with a frozen predictive model and show that users able to
provide feedback can achieve a better and fairer balance between task
performance and bias mitigation. In one setup, users, by interacting with test
examples, further decreased bias in the explanations (5-8%) while maintaining
the same prediction accuracy. In the other setup, human feedback was able to
disentangle associated bias and predictive information from the input leading
to superior bias mitigation and improved task performance (4-5%)
simultaneously.
Related papers
- Looking at Model Debiasing through the Lens of Anomaly Detection [11.113718994341733]
Deep neural networks are sensitive to bias in the data.
We propose a new bias identification method based on anomaly detection.
We reach state-of-the-art performance on synthetic and real benchmark datasets.
arXiv Detail & Related papers (2024-07-24T17:30:21Z) - Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction [56.17020601803071]
Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction.
This paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating prompt bias.
arXiv Detail & Related papers (2024-03-15T02:04:35Z) - Improving Bias Mitigation through Bias Experts in Natural Language
Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model.
Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z) - Robust Natural Language Understanding with Residual Attention Debiasing [28.53546504339952]
We propose an end-to-end debiasing method that mitigates unintended biases from attention.
Experiments show that READ significantly improves the performance of BERT-based models on OOD data with shortcuts removed.
arXiv Detail & Related papers (2023-05-28T04:25:04Z) - Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased
Recommendations [4.960902915238239]
We propose a theoretically guaranteed model-agnostic balancing approach that can be applied to any existing debiasing method.
The proposed approach makes full use of unbiased data by alternatively correcting model parameters learned with biased data, and adaptively learning balance coefficients of biased samples for further debiasing.
arXiv Detail & Related papers (2023-04-17T08:56:55Z) - Controlling Bias Exposure for Fair Interpretable Predictions [11.364105288235308]
We argue that a favorable debiasing method should use sensitive information 'fairly' rather than blindly eliminating it.
Our model achieves a desirable trade-off between debiasing and task performance along with producing debiased rationales as evidence.
arXiv Detail & Related papers (2022-10-14T01:49:01Z) - Self-supervised debiasing using low rank regularization [59.84695042540525]
Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability.
We propose a self-supervised debiasing framework potentially compatible with unlabeled samples.
Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines.
arXiv Detail & Related papers (2022-10-11T08:26:19Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.