Hybrid Sample Synthesis-based Debiasing of Classifier in Limited Data
Setting
- URL: http://arxiv.org/abs/2312.08288v2
- Date: Wed, 20 Dec 2023 10:46:33 GMT
- Title: Hybrid Sample Synthesis-based Debiasing of Classifier in Limited Data
Setting
- Authors: Piyush Arora, Pratik Mazumder
- Abstract summary: This paper focuses on a more practical setting with no prior information about the bias.
In this setting, there are a large number of bias-aligned samples that cause the model to produce biased predictions.
If the training data is limited, the influence of the bias-aligned samples may become even stronger on the model predictions.
- Score: 5.837881923712393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models are known to suffer from the problem of bias, and
researchers have been exploring methods to address this issue. However, most of
these methods require prior knowledge of the bias and are not always practical.
In this paper, we focus on a more practical setting with no prior information
about the bias. Generally, in this setting, there are a large number of
bias-aligned samples that cause the model to produce biased predictions and a
few bias-conflicting samples that do not conform to the bias. If the training
data is limited, the influence of the bias-aligned samples may become even
stronger on the model predictions, and we experimentally demonstrate that
existing debiasing techniques suffer severely in such cases. In this paper, we
examine the effects of unknown bias in small dataset regimes and present a
novel approach to mitigate this issue. The proposed approach directly addresses
the issue of the extremely low occurrence of bias-conflicting samples in
limited data settings through the synthesis of hybrid samples that can be used
to reduce the effect of bias. We perform extensive experiments on several
benchmark datasets and experimentally demonstrate the effectiveness of our
proposed approach in addressing any unknown bias in the presence of limited
data. Specifically, our approach outperforms the vanilla, LfF, LDD, and DebiAN
debiasing methods by absolute margins of 10.39%, 9.08%, 8.07%, and 9.67% when
only 10% of the Corrupted CIFAR-10 Type 1 dataset is available with a
bias-conflicting sample ratio of 0.05.
Related papers
- A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective [33.78421391776591]
In this paper, we propose a novel perspective of mislabeled sample detection.
We show that our new perspective can boost the precision of detection and rectify biased models effectively.
Our approach is complementary to existing methods, showing performance improvement even when applied to models that have already undergone recent debiasing techniques.
arXiv Detail & Related papers (2024-11-01T04:54:32Z) - Looking at Model Debiasing through the Lens of Anomaly Detection [11.113718994341733]
Deep neural networks are sensitive to bias in the data.
We propose a new bias identification method based on anomaly detection.
We reach state-of-the-art performance on synthetic and real benchmark datasets.
arXiv Detail & Related papers (2024-07-24T17:30:21Z) - DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection [9.801159950963306]
We propose DiffInject, a powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model.
Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing.
arXiv Detail & Related papers (2024-06-10T09:45:38Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - IBADR: an Iterative Bias-Aware Dataset Refinement Framework for
Debiasing NLU models [52.03761198830643]
We propose IBADR, an Iterative Bias-Aware dataset Refinement framework.
We first train a shallow model to quantify the bias degree of samples in the pool.
Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator.
In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples.
arXiv Detail & Related papers (2023-11-01T04:50:38Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Information-Theoretic Bias Reduction via Causal View of Spurious
Correlation [71.9123886505321]
We propose an information-theoretic bias measurement technique through a causal interpretation of spurious correlation.
We present a novel debiasing framework against the algorithmic bias, which incorporates a bias regularization loss.
The proposed bias measurement and debiasing approaches are validated in diverse realistic scenarios.
arXiv Detail & Related papers (2022-01-10T01:19:31Z) - Learning Debiased Models with Dynamic Gradient Alignment and
Bias-conflicting Sample Mining [39.00256193731365]
Deep neural networks notoriously suffer from dataset biases which are detrimental to model robustness, generalization and fairness.
We propose a two-stage debiasing scheme to combat against the intractable unknown biases.
arXiv Detail & Related papers (2021-11-25T14:50:10Z) - Learning Debiased Representation via Disentangled Feature Augmentation [19.348340314001756]
This paper presents an empirical analysis revealing that training with "diverse" bias-conflicting samples is crucial for debiasing.
We propose a novel feature-level data augmentation technique in order to synthesize diverse bias-conflicting samples.
arXiv Detail & Related papers (2021-07-03T08:03:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.