Adapting Contrastive Language-Image Pretrained (CLIP) Models for
Out-of-Distribution Detection
- URL: http://arxiv.org/abs/2303.05828v2
- Date: Thu, 9 Nov 2023 10:23:29 GMT
- Title: Adapting Contrastive Language-Image Pretrained (CLIP) Models for
Out-of-Distribution Detection
- Authors: Nikolas Adaloglou and Felix Michels and Tim Kaiser and Markus Kollmann
- Abstract summary: We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection.
We propose a new simple and scalable method called textitpseudo-label probing (PLP) that adapts vision-language models for OOD detection.
- Score: 1.597617022056624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a comprehensive experimental study on pretrained feature
extractors for visual out-of-distribution (OOD) detection, focusing on adapting
contrastive language-image pretrained (CLIP) models. Without fine-tuning on the
training data, we are able to establish a positive correlation ($R^2\geq0.92$)
between in-distribution classification and unsupervised OOD detection for CLIP
models in $4$ benchmarks. We further propose a new simple and scalable method
called \textit{pseudo-label probing} (PLP) that adapts vision-language models
for OOD detection. Given a set of label names of the training set, PLP trains a
linear layer using the pseudo-labels derived from the text encoder of CLIP. To
test the OOD detection robustness of pretrained models, we develop a novel
feature-based adversarial OOD data manipulation approach to create adversarial
samples. Intriguingly, we show that (i) PLP outperforms the previous
state-of-the-art \citep{ming2022mcm} on all $5$ large-scale benchmarks based on
ImageNet, specifically by an average AUROC gain of 3.4\% using the largest CLIP
model (ViT-G), (ii) we show that linear probing outperforms fine-tuning by
large margins for CLIP architectures (i.e. CLIP ViT-H achieves a mean gain of
7.3\% AUROC on average on all ImageNet-based benchmarks), and (iii)
billion-parameter CLIP models still fail at detecting adversarially manipulated
OOD images. The code and adversarially created datasets will be made publicly
available.
Related papers
- LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies [22.100031612580356]
We tackle the challenge of predicting models' Out-of-Distribution (OOD) performance using in-distribution (ID) measurements without requiring OOD data.
We introduce the Lowest Common Ancestor (LCA)-on-the-Line framework, which measures the hierarchical distance between labels and predictions within a predefined class hierarchy.
arXiv Detail & Related papers (2024-07-22T21:54:19Z) - SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation [5.590633742488972]
Out-of-distribution (OOD) detection is crucial for the safe deployment of neural networks.
We propose SeTAR, a training-free OOD detection method.
SeTAR enhances OOD detection via post-hoc modification of the model's weight matrices using a simple greedy search algorithm.
Our work offers a scalable, efficient solution for OOD detection, setting a new state-of-the-art in this area.
arXiv Detail & Related papers (2024-06-18T13:55:13Z) - Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection [71.93411099797308]
Out-of-distribution (OOD) samples are crucial when deploying machine learning models in open-world scenarios.
We propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to potential Outlier Exposure, termed EOE.
EOE can be generalized to different tasks, including far, near, and fine-language OOD detection.
EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-06-02T17:09:48Z) - Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models.
Our method allows for effortless integration with existing models' training pipelines.
On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z) - A framework for benchmarking class-out-of-distribution detection and its
application to ImageNet [15.929238800072195]
We present a novel framework to benchmark the ability of image classifiers to detect classout-of-distribution instances.
We apply this technique to ImageNet, and 525 pretrained, publicly available, ImageNet-1k classifiers.
arXiv Detail & Related papers (2023-02-23T09:57:48Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Prompt-based Learning for Unpaired Image Captioning [86.44188293709307]
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs.
Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning.
We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability.
arXiv Detail & Related papers (2022-05-26T03:13:43Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - $k$Folden: $k$-Fold Ensemble for Out-Of-Distribution Detection [31.10536251430344]
Out-of-Distribution (OOD) detection is an important problem in natural language processing (NLP)
We propose a framework $k$Folden, which mimics the behaviors of OOD detection during training without the use of any external data.
arXiv Detail & Related papers (2021-08-29T01:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.