Inducing Data Amplification Using Auxiliary Datasets in Adversarial
Training
- URL: http://arxiv.org/abs/2209.14053v1
- Date: Tue, 27 Sep 2022 09:21:40 GMT
- Title: Inducing Data Amplification Using Auxiliary Datasets in Adversarial
Training
- Authors: Saehyung Lee and Hyungyu Lee
- Abstract summary: We propose a biased multi-domain adversarial training (BiaMAT) method that induces training data amplification on a primary dataset.
The proposed method can achieve increased adversarial robustness on a primary dataset by leveraging auxiliary datasets.
- Score: 7.513100214864646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several recent studies have shown that the use of extra in-distribution data
can lead to a high level of adversarial robustness. However, there is no
guarantee that it will always be possible to obtain sufficient extra data for a
selected dataset. In this paper, we propose a biased multi-domain adversarial
training (BiaMAT) method that induces training data amplification on a primary
dataset using publicly available auxiliary datasets, without requiring the
class distribution match between the primary and auxiliary datasets. The
proposed method can achieve increased adversarial robustness on a primary
dataset by leveraging auxiliary datasets via multi-domain learning.
Specifically, data amplification on both robust and non-robust features can be
accomplished through the application of BiaMAT as demonstrated through a
theoretical and empirical analysis. Moreover, we demonstrate that while
existing methods are vulnerable to negative transfer due to the distributional
discrepancy between auxiliary and primary data, the proposed method enables
neural networks to flexibly leverage diverse image datasets for adversarial
training by successfully handling the domain discrepancy through the
application of a confidence-based selection strategy. The pre-trained models
and code are available at: \url{https://github.com/Saehyung-Lee/BiaMAT}.
Related papers
- First-Order Manifold Data Augmentation for Regression Learning [4.910937238451485]
We introduce FOMA: a new data-driven domain-independent data augmentation method.
We evaluate FOMA on in-distribution generalization and out-of-distribution benchmarks, and we show that it improves the generalization of several neural architectures.
arXiv Detail & Related papers (2024-06-16T12:35:05Z) - A Comprehensive Survey on Data Augmentation [55.355273602421384]
Data augmentation is a technique that generates high-quality artificial data by manipulating existing data samples.
Existing literature surveys only focus on a certain type of specific modality data.
We propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities.
arXiv Detail & Related papers (2024-05-15T11:58:08Z) - Group Distributionally Robust Dataset Distillation with Risk
Minimization [18.07189444450016]
We introduce an algorithm that combines clustering with the minimization of a risk measure on the loss to conduct DD.
We demonstrate its effective generalization and robustness across subgroups through numerical experiments.
arXiv Detail & Related papers (2024-02-07T09:03:04Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Achieving Representative Data via Convex Hull Feasibility Sampling
Algorithms [35.29582673348303]
Sampling biases in training data are a major source of algorithmic biases in machine learning systems.
We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources.
arXiv Detail & Related papers (2022-04-13T23:14:05Z) - Lightweight Data Fusion with Conjugate Mappings [11.760099863897835]
We present an approach to data fusion that combines the interpretability of structured probabilistic graphical models with the flexibility of neural networks.
The proposed method, lightweight data fusion (LDF), emphasizes posterior analysis over latent variables using two types of information.
arXiv Detail & Related papers (2020-11-20T19:47:13Z) - Sequential Targeting: an incremental learning approach for data
imbalance in text classification [7.455546102930911]
Methods to handle imbalanced datasets are crucial for alleviating distributional skews.
We propose a novel training method, Sequential Targeting(ST), independent of the effectiveness of the representation method.
We demonstrate the effectiveness of our method through experiments on simulated benchmark datasets (IMDB) and data collected from NAVER.
arXiv Detail & Related papers (2020-11-20T04:54:00Z) - Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for
Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher.
The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models.
We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.