On Data Augmentation and Adversarial Risk: An Empirical Analysis
- URL: http://arxiv.org/abs/2007.02650v1
- Date: Mon, 6 Jul 2020 11:16:18 GMT
- Title: On Data Augmentation and Adversarial Risk: An Empirical Analysis
- Authors: Hamid Eghbal-zadeh, Khaled Koutini, Paul Primus, Verena Haunschmid,
Michal Lewandowski, Werner Zellinger, Bernhard A. Moser, Gerhard Widmer
- Abstract summary: We analyse the effect of different data augmentation techniques on the adversarial risk by three measures.
We disprove the hypothesis that an improvement in the classification performance induced by a data augmentation is always accompanied by an improvement in the risk under adversarial attack.
Our results reveal that the augmented data has more influence than the non-augmented data, on the resulting models.
- Score: 9.586672294115075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation techniques have become standard practice in deep learning,
as it has been shown to greatly improve the generalisation abilities of models.
These techniques rely on different ideas such as invariance-preserving
transformations (e.g, expert-defined augmentation), statistical heuristics
(e.g, Mixup), and learning the data distribution (e.g, GANs). However, in the
adversarial settings it remains unclear under what conditions such data
augmentation methods reduce or even worsen the misclassification risk. In this
paper, we therefore analyse the effect of different data augmentation
techniques on the adversarial risk by three measures: (a) the well-known risk
under adversarial attacks, (b) a new measure of prediction-change stress based
on the Laplacian operator, and (c) the influence of training examples on
prediction. The results of our empirical analysis disprove the hypothesis that
an improvement in the classification performance induced by a data augmentation
is always accompanied by an improvement in the risk under adversarial attack.
Further, our results reveal that the augmented data has more influence than the
non-augmented data, on the resulting models. Taken together, our results
suggest that general-purpose data augmentations that do not take into the
account the characteristics of the data and the task, must be applied with
care.
Related papers
- MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - On Counterfactual Data Augmentation Under Confounding [30.76982059341284]
Counterfactual data augmentation has emerged as a method to mitigate confounding biases in the training data.
These biases arise due to various observed and unobserved confounding variables in the data generation process.
We show how our simple augmentation method helps existing state-of-the-art methods achieve good results.
arXiv Detail & Related papers (2023-05-29T16:20:23Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - Is augmentation effective to improve prediction in imbalanced text
datasets? [3.1690891866882236]
We argue that adjusting the cutoffs without data augmentation can produce similar results to oversampling techniques.
Our findings contribute to a better understanding of the strengths and limitations of different approaches to dealing with imbalanced data.
arXiv Detail & Related papers (2023-04-20T13:07:31Z) - DeepVol: Volatility Forecasting from High-Frequency Data with Dilated Causal Convolutions [53.37679435230207]
We propose DeepVol, a model based on Dilated Causal Convolutions that uses high-frequency data to forecast day-ahead volatility.
Our empirical results suggest that the proposed deep learning-based approach effectively learns global features from high-frequency data.
arXiv Detail & Related papers (2022-09-23T16:13:47Z) - Augmentation-Aware Self-Supervision for Data-Efficient GAN Training [68.81471633374393]
Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting.
We propose a novel augmentation-aware self-supervised discriminator that predicts the augmentation parameter of the augmented data.
We compare our method with state-of-the-art (SOTA) methods using the class-conditional BigGAN and unconditional StyleGAN2 architectures.
arXiv Detail & Related papers (2022-05-31T10:35:55Z) - Data Augmentation in the Underparameterized and Overparameterized
Regimes [7.326504492614808]
We quantify how data augmentation affects the variance and limiting distribution of estimates.
The results confirm some observations made in machine learning practice, but also lead to unexpected findings.
arXiv Detail & Related papers (2022-02-18T11:32:41Z) - Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.