On Transferability of Bias Mitigation Effects in Language Model
Fine-Tuning
- URL: http://arxiv.org/abs/2010.12864v2
- Date: Sun, 11 Apr 2021 23:34:33 GMT
- Title: On Transferability of Bias Mitigation Effects in Language Model
Fine-Tuning
- Authors: Xisen Jin, Francesco Barbieri, Brendan Kennedy, Aida Mostafazadeh
Davani, Leonardo Neves, Xiang Ren
- Abstract summary: Fine-tuned language models have been shown to exhibit biases against protected groups in a host of modeling tasks.
Previous works focus on detecting these biases, reducing bias in data representations, and using auxiliary training objectives to mitigate bias during fine-tuning.
We explore the feasibility and benefits of upstream bias mitigation (UBM) for reducing bias on downstream tasks.
- Score: 30.833538367971872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuned language models have been shown to exhibit biases against
protected groups in a host of modeling tasks such as text classification and
coreference resolution. Previous works focus on detecting these biases,
reducing bias in data representations, and using auxiliary training objectives
to mitigate bias during fine-tuning. Although these techniques achieve bias
reduction for the task and domain at hand, the effects of bias mitigation may
not directly transfer to new tasks, requiring additional data collection and
customized annotation of sensitive attributes, and re-evaluation of appropriate
fairness metrics. We explore the feasibility and benefits of upstream bias
mitigation (UBM) for reducing bias on downstream tasks, by first applying bias
mitigation to an upstream model through fine-tuning and subsequently using it
for downstream fine-tuning. We find, in extensive experiments across hate
speech detection, toxicity detection, occupation prediction, and coreference
resolution tasks over various bias factors, that the effects of UBM are indeed
transferable to new downstream tasks or domains via fine-tuning, creating less
biased downstream models than directly fine-tuning on the downstream task or
transferring from a vanilla upstream model. Though challenges remain, we show
that UBM promises more efficient and accessible bias mitigation in LM
fine-tuning.
Related papers
- Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness [10.081447621656523]
The impact on language modeling ability can be alleviated given a high-quality and long-contextualized debiasing corpus.
The effectiveness of task-agnostic debiasing hinges on the quantitative bias level of both the task-specific data used for downstream applications and the debiased model.
We propose a novel framework which can Propagate Socially-fair Debiasing to Downstream Fine-tuning, ProSocialTuning.
arXiv Detail & Related papers (2024-06-06T15:11:11Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Improving Bias Mitigation through Bias Experts in Natural Language
Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model.
Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z) - Overwriting Pretrained Bias with Finetuning Data [36.050345384273655]
We investigate bias when conceptualized as both spurious correlations between the target task and a sensitive attribute as well as underrepresentation of a particular group in the dataset.
We find that models finetuned on top of pretrained models can indeed inherit their biases, but (2) this bias can be corrected for through relatively minor interventions to the finetuning dataset.
Our findings imply that careful curation of the finetuning dataset is important for reducing biases on a downstream task, and doing so can even compensate for bias in the pretrained model.
arXiv Detail & Related papers (2023-03-10T19:10:58Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - Debiased Fine-Tuning for Vision-language Models by Prompt Regularization [50.41984119504716]
We present a new paradigm for fine-tuning large-scale vision pre-trained models on downstream task, dubbed Prompt Regularization (ProReg)
ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning.
We show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T11:53:55Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Mind the Trade-off: Debiasing NLU Models without Degrading the
In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization.
It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples.
We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.