How Does Unlabeled Data Provably Help Out-of-Distribution Detection?
- URL: http://arxiv.org/abs/2402.03502v1
- Date: Mon, 5 Feb 2024 20:36:33 GMT
- Title: How Does Unlabeled Data Provably Help Out-of-Distribution Detection?
- Authors: Xuefeng Du, Zhen Fang, Ilias Diakonikolas, Yixuan Li
- Abstract summary: Unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and out-of-distribution (OOD) data.
This paper introduces a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness.
- Score: 63.41681272937562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using unlabeled data to regularize the machine learning models has
demonstrated promise for improving safety and reliability in detecting
out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild
data is non-trivial due to the heterogeneity of both in-distribution (ID) and
OOD data. This lack of a clean set of OOD samples poses significant challenges
in learning an optimal OOD classifier. Currently, there is a lack of research
on formally understanding how unlabeled data helps OOD detection. This paper
bridges the gap by introducing a new learning framework SAL (Separate And
Learn) that offers both strong theoretical guarantees and empirical
effectiveness. The framework separates candidate outliers from the unlabeled
data and then trains an OOD classifier using the candidate outliers and the
labeled ID data. Theoretically, we provide rigorous error bounds from the lens
of separability and learnability, formally justifying the two components in our
algorithm. Our theory shows that SAL can separate the candidate outliers with
small error rates, which leads to a generalization guarantee for the learned
OOD classifier. Empirically, SAL achieves state-of-the-art performance on
common benchmarks, reinforcing our theoretical insights. Code is publicly
available at https://github.com/deeplearning-wisc/sal.
Related papers
- What If the Input is Expanded in OOD Detection? [77.37433624869857]
Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes.
Various scoring functions are proposed to distinguish it from in-distribution (ID) data.
We introduce a novel perspective, i.e., employing different common corruptions on the input space.
arXiv Detail & Related papers (2024-10-24T06:47:28Z) - RICASSO: Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure [21.809270017579806]
Deep learning models often face challenges from both imbalanced (long-tailed) and out-of-distribution (OOD) data.
Our research shows that data mixing can generate pseudo-OOD data that exhibit the features of both in-distribution (ID) data and OOD data.
We propose a unified framework called Reinforced Imbalance Learning with Class-Aware Self-Supervised Outliers Exposure (RICASSO)
arXiv Detail & Related papers (2024-10-14T14:29:32Z) - When and How Does In-Distribution Label Help Out-of-Distribution Detection? [38.874518492468965]
This paper offers a formal understanding to theoretically delineate the impact of ID labels on OOD detection.
We employ a graph-theoretic approach, rigorously analyzing the separability of ID data from OOD data in a closed-form manner.
We present empirical results on both simulated and real datasets, validating theoretical guarantees and reinforcing our insights.
arXiv Detail & Related papers (2024-05-28T22:34:53Z) - EAT: Towards Long-Tailed Out-of-Distribution Detection [55.380390767978554]
This paper addresses the challenging task of long-tailed OOD detection.
The main difficulty lies in distinguishing OOD data from samples belonging to the tail classes.
We propose two simple ideas: (1) Expanding the in-distribution class space by introducing multiple abstention classes, and (2) Augmenting the context-limited tail classes by overlaying images onto the context-rich OOD data.
arXiv Detail & Related papers (2023-12-14T13:47:13Z) - Out-of-Distribution Detection with Hilbert-Schmidt Independence
Optimization [114.43504951058796]
Outlier detection tasks have been playing a critical role in AI safety.
Deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence.
We propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks.
arXiv Detail & Related papers (2022-09-26T15:59:55Z) - Training OOD Detectors in their Natural Habitats [31.565635192716712]
Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild.
Recent methods use auxiliary outlier data to regularize the model for improved OOD detection.
We propose a novel framework that leverages wild mixture data -- that naturally consists of both ID and OOD samples.
arXiv Detail & Related papers (2022-02-07T15:38:39Z) - Provably Robust Detection of Out-of-distribution Data (almost) for free [124.14121487542613]
Deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data.
In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier.
In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data.
arXiv Detail & Related papers (2021-06-08T11:40:49Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.