PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
- URL: http://arxiv.org/abs/2410.20631v2
- Date: Mon, 13 Jan 2025 23:45:51 GMT
- Title: PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
- Authors: Tianhao Zhang, Zhixiang Chen, Lyudmila S. Mihaylova,
- Abstract summary: We introduce Prior-augmented Vision Transformer (PViT) to enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection.
PViT shapes the decision boundary between ID and OOD by utilizing the proposed prior guided confidence.
PViT significantly outperforms existing SOTA OOD detection methods in terms of FPR95 and AUROC.
- Score: 10.724906455759854
- License:
- Abstract: Vision Transformers (ViTs) have achieved remarkable success over various vision tasks, yet their robustness against data distribution shifts and inherent inductive biases remain underexplored. To enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection, we introduce a novel and generic framework named Prior-augmented Vision Transformer (PViT). Taking as input the prior class logits from a pretrained model, we train PViT to predict the class logits. During inference, PViT identifies OOD samples by quantifying the divergence between the predicted class logits and the prior logits obtained from pre-trained models. Unlike existing state-of-the-art(SOTA) OOD detection methods, PViT shapes the decision boundary between ID and OOD by utilizing the proposed prior guided confidence, without requiring additional data modeling, generation methods, or structural modifications. Extensive experiments on the large-scale ImageNet benchmark, evaluated against over seven OOD datasets, demonstrate that PViT significantly outperforms existing SOTA OOD detection methods in terms of FPR95 and AUROC. The codebase is publicly available at https://github.com/RanchoGoose/PViT.
Related papers
- Can OOD Object Detectors Learn from Foundation Models? [56.03404530594071]
Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data.
Inspired by recent advancements in text-to-image generative models, we study the potential of generative models trained on large-scale open-set data to synthesize OOD samples.
We introduce SyncOOD, a simple data curation method that capitalizes on the capabilities of large foundation models.
arXiv Detail & Related papers (2024-09-08T17:28:22Z) - Mitigating Overconfidence in Out-of-Distribution Detection by Capturing Extreme Activations [1.8531577178922987]
"Overconfidence" is an intrinsic property of certain neural network architectures, leading to poor OOD detection.
We measure extreme activation values in the penultimate layer of neural networks and then leverage this proxy of overconfidence to improve on several OOD detection baselines.
Compared to the baselines, our method often grants substantial improvements, with double-digit increases in OOD detection.
arXiv Detail & Related papers (2024-05-21T10:14:50Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Combining pre-trained Vision Transformers and CIDER for Out Of Domain
Detection [0.774971301405295]
Most industrial pipelines rely on pre-trained models for downstream tasks such as CNN or Vision Transformers.
This paper investigates the performance of those models on the task of out-of-domain detection.
arXiv Detail & Related papers (2023-09-06T14:41:55Z) - Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual
Document Understanding Models [68.12229916000584]
We develop an out-of-distribution (OOD) benchmark termed Do-GOOD for the fine-Grained analysis on Document image-related tasks.
We then evaluate the robustness and perform a fine-grained analysis of 5 latest VDU pre-trained models and 2 typical OOD generalization algorithms.
arXiv Detail & Related papers (2023-06-05T06:50:42Z) - Masked Images Are Counterfactual Samples for Robust Fine-tuning [77.82348472169335]
Fine-tuning deep learning models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness.
We propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.
arXiv Detail & Related papers (2023-03-06T11:51:28Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - How Useful are Gradients for OOD Detection Really? [5.459639971144757]
Out of distribution (OOD) detection is a critical challenge in deploying highly performant machine learning models in real-life applications.
We provide an in-depth analysis and comparison of gradient based methods for OOD detection.
We propose a general, non-gradient based method of OOD detection which improves over previous baselines in both performance and computational efficiency.
arXiv Detail & Related papers (2022-05-20T21:10:05Z) - OODformer: Out-Of-Distribution Detection Transformer [15.17006322500865]
In real-world safety-critical applications, it is important to be aware if a new data point is OOD.
This paper proposes a first-of-its-kind OOD detection architecture named OODformer.
arXiv Detail & Related papers (2021-07-19T15:46:38Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.