The Value of Out-of-Distribution Data
- URL: http://arxiv.org/abs/2208.10967v4
- Date: Mon, 10 Jul 2023 09:15:22 GMT
- Title: The Value of Out-of-Distribution Data
- Authors: Ashwin De Silva, Rahul Ramesh, Carey E. Priebe, Pratik Chaudhari,
Joshua T. Vogelstein
- Abstract summary: We show that the generalization error of a task can be a non-monotonic function of the number of OOD samples.
In other words, there is value in training on small amounts of OOD data.
- Score: 28.85184823032929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We expect the generalization error to improve with more samples from a
similar task, and to deteriorate with more samples from an out-of-distribution
(OOD) task. In this work, we show a counter-intuitive phenomenon: the
generalization error of a task can be a non-monotonic function of the number of
OOD samples. As the number of OOD samples increases, the generalization error
on the target task improves before deteriorating beyond a threshold. In other
words, there is value in training on small amounts of OOD data. We use Fisher's
Linear Discriminant on synthetic datasets and deep networks on computer vision
benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS and DomainNet to demonstrate
and analyze this phenomenon. In the idealistic setting where we know which
samples are OOD, we show that these non-monotonic trends can be exploited using
an appropriately weighted objective of the target and OOD empirical risk. While
its practical utility is limited, this does suggest that if we can detect OOD
samples, then there may be ways to benefit from them. When we do not know which
samples are OOD, we show how a number of go-to strategies such as
data-augmentation, hyper-parameter optimization, and pre-training are not
enough to ensure that the target generalization error does not deteriorate with
the number of OOD samples in the dataset.
Related papers
- Model Reprogramming Outperforms Fine-tuning on Out-of-distribution Data in Text-Image Encoders [56.47577824219207]
In this paper, we unveil the hidden costs associated with intrusive fine-tuning techniques.
We introduce a new model reprogramming approach for fine-tuning, which we name Reprogrammer.
Our empirical evidence reveals that Reprogrammer is less intrusive and yields superior downstream models.
arXiv Detail & Related papers (2024-03-16T04:19:48Z) - Mixture Data for Training Cannot Ensure Out-of-distribution Generalization [21.801115344132114]
We show that increasing the size of training data does not always lead to a reduction in the test generalization error.
In this work, we quantitatively redefine OOD data as those situated outside the convex hull of mixed training data.
Our proof of the new risk bound agrees that the efficacy of well-trained models can be guaranteed for unseen data.
arXiv Detail & Related papers (2023-12-25T11:00:38Z) - Out-of-distribution Detection with Implicit Outlier Transformation [72.73711947366377]
Outlier exposure (OE) is powerful in out-of-distribution (OOD) detection.
We propose a novel OE-based approach that makes the model perform well for unseen OOD situations.
arXiv Detail & Related papers (2023-03-09T04:36:38Z) - ReSmooth: Detecting and Utilizing OOD Samples when Training with Data
Augmentation [57.38418881020046]
Recent DA techniques always meet the need for diversity in augmented training samples.
An augmentation strategy that has a high diversity usually introduces out-of-distribution (OOD) augmented samples.
We propose ReSmooth, a framework that firstly detects OOD samples in augmented samples and then leverages them.
arXiv Detail & Related papers (2022-05-25T09:29:27Z) - Understanding, Detecting, and Separating Out-of-Distribution Samples and
Adversarial Samples in Text Classification [80.81532239566992]
We compare the two types of anomalies (OOD and Adv samples) with the in-distribution (ID) ones from three aspects.
We find that OOD samples expose their aberration starting from the first layer, while the abnormalities of Adv samples do not emerge until the deeper layers of the model.
We propose a simple method to separate ID, OOD, and Adv samples using the hidden representations and output probabilities of the model.
arXiv Detail & Related papers (2022-04-09T12:11:59Z) - Training OOD Detectors in their Natural Habitats [31.565635192716712]
Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild.
Recent methods use auxiliary outlier data to regularize the model for improved OOD detection.
We propose a novel framework that leverages wild mixture data -- that naturally consists of both ID and OOD samples.
arXiv Detail & Related papers (2022-02-07T15:38:39Z) - Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection [72.35532598131176]
We propose an unsupervised method to detect OOD samples using a $k$-NN density estimate.
We leverage a recent insight about label smoothing, which we call the emphLabel Smoothed Embedding Hypothesis
We show that our proposal outperforms many OOD baselines and also provide new finite-sample high-probability statistical results.
arXiv Detail & Related papers (2021-02-09T21:04:44Z) - On The Consistency Training for Open-Set Semi-Supervised Learning [44.046578996049654]
We study how OOD samples affect training in both low- and high-dimensional spaces.
Our method makes better use of OOD samples and achieves state-of-the-art results.
arXiv Detail & Related papers (2021-01-19T12:38:17Z) - Learn what you can't learn: Regularized Ensembles for Transductive
Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios.
This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data.
We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z) - Detecting Out-of-Distribution Examples with In-distribution Examples and
Gram Matrices [8.611328447624679]
Deep neural networks yield confident, incorrect predictions when presented with Out-of-Distribution examples.
In this paper, we propose to detect OOD examples by identifying inconsistencies between activity patterns and class predicted.
We find that characterizing activity patterns by Gram matrices and identifying anomalies in gram matrix values can yield high OOD detection rates.
arXiv Detail & Related papers (2019-12-28T19:44:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.