Negative Data Augmentation
- URL: http://arxiv.org/abs/2102.05113v1
- Date: Tue, 9 Feb 2021 20:28:35 GMT
- Title: Negative Data Augmentation
- Authors: Abhishek Sinha, Kumar Ayush, Jiaming Song, Burak Uzkent, Hongxia Jin,
Stefano Ermon
- Abstract summary: We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
- Score: 127.28042046152954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data augmentation is often used to enlarge datasets with synthetic samples
generated in accordance with the underlying data distribution. To enable a
wider range of augmentations, we explore negative data augmentation strategies
(NDA)that intentionally create out-of-distribution samples. We show that such
negative out-of-distribution samples provide information on the support of the
data distribution, and can be leveraged for generative modeling and
representation learning. We introduce a new GAN training objective where we use
NDA as an additional source of synthetic data for the discriminator. We prove
that under suitable conditions, optimizing the resulting objective still
recovers the true data distribution but can directly bias the generator towards
avoiding samples that lack the desired structure. Empirically, models trained
with our method achieve improved conditional/unconditional image generation
along with improved anomaly detection capabilities. Further, we incorporate the
same negative data augmentation strategy in a contrastive learning framework
for self-supervised representation learning on images and videos, achieving
improved performance on downstream image classification, object detection, and
action recognition tasks. These results suggest that prior knowledge on what
does not constitute valid data is an effective form of weak supervision across
a range of unsupervised learning tasks.
Related papers
- Few-shot Online Anomaly Detection and Segmentation [29.693357653538474]
This paper focuses on addressing the challenging yet practical few-shot online anomaly detection and segmentation (FOADS) task.
Under the FOADS framework, models are trained on a few-shot normal dataset, followed by inspection and improvement of their capabilities by leveraging unlabeled streaming data containing both normal and abnormal samples simultaneously.
In order to achieve improved performance with limited training samples, we employ multi-scale feature embedding extracted from a CNN pre-trained on ImageNet to obtain a robust representation.
arXiv Detail & Related papers (2024-03-27T02:24:00Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Feedback-guided Data Synthesis for Imbalanced Classification [10.836265321046561]
We introduce a framework for augmenting static datasets with useful synthetic samples.
We find that the samples must be close to the support of the real data of the task at hand, and be sufficiently diverse.
On ImageNet-LT, we achieve state-of-the-art results, with over 4 percent improvement on underrepresented classes.
arXiv Detail & Related papers (2023-09-29T21:47:57Z) - Reward-Directed Conditional Diffusion: Provable Distribution Estimation
and Reward Improvement [42.45888600367566]
Directed generation aims to generate samples with desired properties as measured by a reward function.
We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels.
arXiv Detail & Related papers (2023-07-13T20:20:40Z) - Leaving Reality to Imagination: Robust Classification via Generated
Datasets [24.411444438920988]
Recent research on robustness has revealed significant performance gaps between neural image classifiers trained on datasets similar to the test set.
We study the question: How do generated datasets influence the natural robustness of image classifiers?
We find that Imagenet classifiers trained on real data augmented with generated data achieve higher accuracy and effective robustness than standard training.
arXiv Detail & Related papers (2023-02-05T22:49:33Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Augmentation-Aware Self-Supervision for Data-Efficient GAN Training [68.81471633374393]
Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting.
We propose a novel augmentation-aware self-supervised discriminator that predicts the augmentation parameter of the augmented data.
We compare our method with state-of-the-art (SOTA) methods using the class-conditional BigGAN and unconditional StyleGAN2 architectures.
arXiv Detail & Related papers (2022-05-31T10:35:55Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.