On Counterfactual Data Augmentation Under Confounding
- URL: http://arxiv.org/abs/2305.18183v2
- Date: Tue, 21 Nov 2023 09:11:38 GMT
- Title: On Counterfactual Data Augmentation Under Confounding
- Authors: Abbavaram Gowtham Reddy, Saketh Bachu, Saloni Dash, Charchit Sharma,
Amit Sharma, Vineeth N Balasubramanian
- Abstract summary: Counterfactual data augmentation has emerged as a method to mitigate confounding biases in the training data.
These biases arise due to various observed and unobserved confounding variables in the data generation process.
We show how our simple augmentation method helps existing state-of-the-art methods achieve good results.
- Score: 30.76982059341284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual data augmentation has recently emerged as a method to mitigate
confounding biases in the training data. These biases, such as spurious
correlations, arise due to various observed and unobserved confounding
variables in the data generation process. In this paper, we formally analyze
how confounding biases impact downstream classifiers and present a causal
viewpoint to the solutions based on counterfactual data augmentation. We
explore how removing confounding biases serves as a means to learn invariant
features, ultimately aiding in generalization beyond the observed data
distribution. Additionally, we present a straightforward yet powerful algorithm
for generating counterfactual images, which effectively mitigates the influence
of confounding effects on downstream classifiers. Through experiments on MNIST
variants and the CelebA datasets, we demonstrate how our simple augmentation
method helps existing state-of-the-art methods achieve good results.
Related papers
- Boosting Model Resilience via Implicit Adversarial Data Augmentation [20.768174896574916]
We propose to augment the deep features of samples by incorporating adversarial and anti-adversarial perturbation distributions.
We then theoretically reveal that our augmentation process approximates the optimization of a surrogate loss function.
We conduct extensive experiments across four common biased learning scenarios.
arXiv Detail & Related papers (2024-04-25T03:22:48Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Data augmentation and explainability for bias discovery and mitigation
in deep learning [0.0]
This dissertation explores the impact of bias in deep neural networks and presents methods for reducing its influence on model performance.
The first part begins by categorizing and describing potential sources of bias and errors in data and models, with a particular focus on bias in machine learning pipelines.
The next chapter outlines a taxonomy and methods of Explainable AI as a way to justify predictions and control and improve the model.
arXiv Detail & Related papers (2023-08-18T11:02:27Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - Is augmentation effective to improve prediction in imbalanced text
datasets? [3.1690891866882236]
We argue that adjusting the cutoffs without data augmentation can produce similar results to oversampling techniques.
Our findings contribute to a better understanding of the strengths and limitations of different approaches to dealing with imbalanced data.
arXiv Detail & Related papers (2023-04-20T13:07:31Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Hard-label Manifolds: Unexpected Advantages of Query Efficiency for
Finding On-manifold Adversarial Examples [67.23103682776049]
Recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives.
It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors.
We propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate.
arXiv Detail & Related papers (2021-03-04T20:53:06Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - On Data Augmentation and Adversarial Risk: An Empirical Analysis [9.586672294115075]
We analyse the effect of different data augmentation techniques on the adversarial risk by three measures.
We disprove the hypothesis that an improvement in the classification performance induced by a data augmentation is always accompanied by an improvement in the risk under adversarial attack.
Our results reveal that the augmented data has more influence than the non-augmented data, on the resulting models.
arXiv Detail & Related papers (2020-07-06T11:16:18Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.