Related papers: How Out-of-Distribution Detection Learning Theory Enhances Transformer: Learnability and Reliability

How Out-of-Distribution Detection Learning Theory Enhances Transformer: Learnability and Reliability

URL: http://arxiv.org/abs/2406.12915v5
Date: Tue, 20 May 2025 15:15:17 GMT
Title: How Out-of-Distribution Detection Learning Theory Enhances Transformer: Learnability and Reliability
Authors: Yijin Zhou, Yutang Ge, Xiaowen Dong, Yuguang Wang,
Abstract summary: This paper introduces the OOD detection Probably Approximately Correct (PAC) Theory for transformers.<n>It shows that outliers can be accurately represented and distinguished with sufficient data under conditions.<n>This approach yields a novel algorithm that ensures learnability and refines the decision boundaries between inliers and outliers.
Score: 10.056026416603006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers excel in natural language processing and computer vision tasks. However, they still face challenges in generalizing to Out-of-Distribution (OOD) datasets, i.e. data whose distribution differs from that seen during training. OOD detection aims to distinguish outliers while preserving in-distribution (ID) data performance. This paper introduces the OOD detection Probably Approximately Correct (PAC) Theory for transformers, which establishes the conditions for data distribution and model configurations for the OOD detection learnability of transformers. It shows that outliers can be accurately represented and distinguished with sufficient data under conditions. The theoretical implications highlight the trade-off between theoretical principles and practical training paradigms. By examining this trade-off, we naturally derived the rationale for leveraging auxiliary outliers to enhance OOD detection. Our theory suggests that by penalizing the misclassification of outliers within the loss function and strategically generating soft synthetic outliers, one can robustly bolster the reliability of transformer networks. This approach yields a novel algorithm that ensures learnability and refines the decision boundaries between inliers and outliers. In practice, the algorithm consistently achieves state-of-the-art (SOTA) performance across various data formats.

Related papers

Revisiting Energy-Based Model for Out-of-Distribution Detection [23.39953997547791]
Outlier Exposure by Simple Transformations (OEST) is a framework that enhances OOD detection by leveraging "peripheral-distribution" (PD) data. PD data are samples generated through simple data transformations, thus providing an efficient alternative to manually curated outliers. OEST* achieves better or similar accuracy compared with state-of-the-art methods.
arXiv Detail & Related papers (2024-12-04T06:25:26Z)
Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning. We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition. We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z)
What If the Input is Expanded in OOD Detection? [77.37433624869857]
Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes. Various scoring functions are proposed to distinguish it from in-distribution (ID) data. We introduce a novel perspective, i.e., employing different common corruptions on the input space.
arXiv Detail & Related papers (2024-10-24T06:47:28Z)
Enhancing OOD Detection Using Latent Diffusion [3.4899193297791054]
Out-of-distribution (OOD) detection is crucial for the reliable deployment of machine learning models in real-world scenarios.<n>Recent efforts have explored using generative models, such as Stable Diffusion, to synthesize outlier data in the pixel space.<n>We propose Outlier-Aware Learning (OAL), a novel framework that generates synthetic OOD training data within the latent space.
arXiv Detail & Related papers (2024-06-24T11:01:43Z)
Mitigating Overconfidence in Out-of-Distribution Detection by Capturing Extreme Activations [1.8531577178922987]
"Overconfidence" is an intrinsic property of certain neural network architectures, leading to poor OOD detection. We measure extreme activation values in the penultimate layer of neural networks and then leverage this proxy of overconfidence to improve on several OOD detection baselines. Compared to the baselines, our method often grants substantial improvements, with double-digit increases in OOD detection.
arXiv Detail & Related papers (2024-05-21T10:14:50Z)
How Does Unlabeled Data Provably Help Out-of-Distribution Detection? [63.41681272937562]
Unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and out-of-distribution (OOD) data. This paper introduces a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness.
arXiv Detail & Related papers (2024-02-05T20:36:33Z)
Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources [73.28967478098107]
Out-of-distribution (OOD) detection discerns OOD data where the predictor cannot make valid predictions as in-distribution (ID) data. It is typically hard to collect real out-of-distribution (OOD) data for training a predictor capable of discerning OOD patterns. We propose a data generation-based learning method named Auxiliary Task-based OOD Learning (ATOL) that can relieve the mistaken OOD generation.
arXiv Detail & Related papers (2023-11-06T16:26:52Z)
Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness [4.724825031148413]
We present a novel method for detecting OOD data in Transformers based on transformation smoothness between intermediate layers of a network. We evaluate BLOOD on several text classification tasks with Transformer networks and demonstrate that it outperforms methods with comparable resource requirements. Our analysis also suggests that when learning simpler tasks, OOD data transformations maintain their original sharpness, whereas sharpness increases with more complex tasks.
arXiv Detail & Related papers (2023-10-04T13:59:45Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Out-of-distribution Detection with Implicit Outlier Transformation [72.73711947366377]
Outlier exposure (OE) is powerful in out-of-distribution (OOD) detection. We propose a novel OE-based approach that makes the model perform well for unseen OOD situations.
arXiv Detail & Related papers (2023-03-09T04:36:38Z)
Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe. GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z)
Augmenting Softmax Information for Selective Classification with Out-of-Distribution Data [7.221206118679026]
We show that existing post-hoc methods perform quite differently compared to when evaluated only on OOD detection. We propose a novel method for SCOD, Softmax Information Retaining Combination (SIRC), that augments softmax-based confidence scores with feature-agnostic information. Experiments on a wide variety of ImageNet-scale datasets and convolutional neural network architectures show that SIRC is able to consistently match or outperform the baseline for SCOD.
arXiv Detail & Related papers (2022-07-15T14:39:57Z)
Igeood: An Information Geometry Approach to Out-of-Distribution Detection [35.04325145919005]
We introduce Igeood, an effective method for detecting out-of-distribution (OOD) samples. Igeood applies to any pre-trained neural network, works under various degrees of access to the machine learning model. We show that Igeood outperforms competing state-of-the-art methods on a variety of network architectures and datasets.
arXiv Detail & Related papers (2022-03-15T11:26:35Z)
Training OOD Detectors in their Natural Habitats [31.565635192716712]
Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild. Recent methods use auxiliary outlier data to regularize the model for improved OOD detection. We propose a novel framework that leverages wild mixture data -- that naturally consists of both ID and OOD samples.
arXiv Detail & Related papers (2022-02-07T15:38:39Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Transformer Uncertainty Estimation with Hierarchical Stochastic Attention [8.95459272947319]
We propose a novel way to enable transformers to have the capability of uncertainty estimation. This is achieved by learning a hierarchical self-attention that attends to values and a set of learnable centroids. We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets.
arXiv Detail & Related papers (2021-12-27T16:43:31Z)
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Uncertainty [58.144520501201995]
Bi-Lipschitz regularization of neural network layers preserve relative distances between data instances in the feature spaces of each layer. With the use of an attentive set encoder, we propose to meta learn either diagonal or diagonal plus low-rank factors to efficiently construct task specific covariance matrices. We also propose an inference procedure which utilizes scaled energy to achieve a final predictive distribution.
arXiv Detail & Related papers (2021-10-12T22:04:19Z)
OODformer: Out-Of-Distribution Detection Transformer [15.17006322500865]
In real-world safety-critical applications, it is important to be aware if a new data point is OOD. This paper proposes a first-of-its-kind OOD detection architecture named OODformer.
arXiv Detail & Related papers (2021-07-19T15:46:38Z)
Learn what you can't learn: Regularized Ensembles for Transductive Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios. This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data. We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z)
Deep Learning of Dynamic Subsurface Flow via Theory-guided Generative Adversarial Network [0.0]
Theory-guided generative adversarial network (TgGAN) is proposed to solve dynamic partial differential equations (PDEs) TgGAN is proposed for dynamic subsurface flow with heterogeneous model parameters. Numerical results demonstrate that the TgGAN model is robust and reliable for deep learning of dynamic PDEs.
arXiv Detail & Related papers (2020-06-02T02:53:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.