Related papers: IBADR: an Iterative Bias-Aware Dataset Refinement Framework for Debiasing NLU models

IBADR: an Iterative Bias-Aware Dataset Refinement Framework for Debiasing NLU models

URL: http://arxiv.org/abs/2311.00292v1
Date: Wed, 1 Nov 2023 04:50:38 GMT
Title: IBADR: an Iterative Bias-Aware Dataset Refinement Framework for Debiasing NLU models
Authors: Xiaoyue Wang, Xin Liu, Lijie Wang, Yaoxiang Wang, Jinsong Su and Hua Wu
Abstract summary: We propose IBADR, an Iterative Bias-Aware dataset Refinement framework. We first train a shallow model to quantify the bias degree of samples in the pool. Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator. In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples.
Score: 52.03761198830643
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As commonly-used methods for debiasing natural language understanding (NLU) models, dataset refinement approaches heavily rely on manual data analysis, and thus maybe unable to cover all the potential biased features. In this paper, we propose IBADR, an Iterative Bias-Aware Dataset Refinement framework, which debiases NLU models without predefining biased features. We maintain an iteratively expanded sample pool. Specifically, at each iteration, we first train a shallow model to quantify the bias degree of samples in the pool. Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator. In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples. Furthermore, we employ the generator to produce pseudo samples with fewer biased features by feeding specific bias indicators. Finally, we incorporate the generated pseudo samples into the pool. Experimental results and in-depth analyses on two NLU tasks show that IBADR not only significantly outperforms existing dataset refinement approaches, achieving SOTA, but also is compatible with model-centric methods.

Related papers

How I Met Your Bias: Investigating Bias Amplification in Diffusion Models [11.80771961784834]
Diffusion-based generative models demonstrate state-of-the-art performance across various image synthesis tasks.<n>Previous research has viewed bias amplification as an inherent characteristic of diffusion models.<n>We empirically demonstrate that samplers for diffusion models have a significant and measurable effect on bias amplification.
arXiv Detail & Related papers (2025-12-23T10:46:48Z)
BLADE: Bias-Linked Adaptive DEbiasing [2.7352017408152083]
BLADE is a generative debiasing framework that requires no prior knowledge of bias or bias-conflicting samples.<n>We evaluate BLADE on multiple benchmark datasets and show that it significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-10-05T12:28:54Z)
DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection [9.801159950963306]
We propose DiffInject, a powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing.
arXiv Detail & Related papers (2024-06-10T09:45:38Z)
Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair [36.221761997349795]
Deep neural networks rely on bias attributes that are spuriously correlated with a target class in the presence of dataset bias. This paper proposes a method that provides the model with explicit spatial guidance that indicates the region of intrinsic features. Experiments demonstrate that our method achieves state-of-the-art performance on synthetic and real-world datasets with various levels of bias severity.
arXiv Detail & Related papers (2024-04-30T04:13:14Z)
Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint. We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b. We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z)
Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo Chamber [17.034228910493056]
This paper presents experimental analyses revealing that the existing biased models overfit to bias-conflicting samples in the training data. We propose a straightforward and effective method called Echoes, which trains a biased model and a target model with a different strategy. Our approach achieves superior debiasing results compared to the existing baselines on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-06T13:13:18Z)
Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets. We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias. DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z)
General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)
Learning Debiased Representation via Disentangled Feature Augmentation [19.348340314001756]
This paper presents an empirical analysis revealing that training with "diverse" bias-conflicting samples is crucial for debiasing. We propose a novel feature-level data augmentation technique in order to synthesize diverse bias-conflicting samples.
arXiv Detail & Related papers (2021-07-03T08:03:25Z)
One for More: Selecting Generalizable Samples for Generalizable ReID Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function. Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z)
Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases. First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method. The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.