Adapting Contrastive Language-Image Pretrained (CLIP) Models for
Out-of-Distribution Detection
- URL: http://arxiv.org/abs/2303.05828v2
- Date: Thu, 9 Nov 2023 10:23:29 GMT
- Title: Adapting Contrastive Language-Image Pretrained (CLIP) Models for
Out-of-Distribution Detection
- Authors: Nikolas Adaloglou and Felix Michels and Tim Kaiser and Markus Kollmann
- Abstract summary: We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection.
We propose a new simple and scalable method called textitpseudo-label probing (PLP) that adapts vision-language models for OOD detection.
- Score: 1.597617022056624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a comprehensive experimental study on pretrained feature
extractors for visual out-of-distribution (OOD) detection, focusing on adapting
contrastive language-image pretrained (CLIP) models. Without fine-tuning on the
training data, we are able to establish a positive correlation ($R^2\geq0.92$)
between in-distribution classification and unsupervised OOD detection for CLIP
models in $4$ benchmarks. We further propose a new simple and scalable method
called \textit{pseudo-label probing} (PLP) that adapts vision-language models
for OOD detection. Given a set of label names of the training set, PLP trains a
linear layer using the pseudo-labels derived from the text encoder of CLIP. To
test the OOD detection robustness of pretrained models, we develop a novel
feature-based adversarial OOD data manipulation approach to create adversarial
samples. Intriguingly, we show that (i) PLP outperforms the previous
state-of-the-art \citep{ming2022mcm} on all $5$ large-scale benchmarks based on
ImageNet, specifically by an average AUROC gain of 3.4\% using the largest CLIP
model (ViT-G), (ii) we show that linear probing outperforms fine-tuning by
large margins for CLIP architectures (i.e. CLIP ViT-H achieves a mean gain of
7.3\% AUROC on average on all ImageNet-based benchmarks), and (iii)
billion-parameter CLIP models still fail at detecting adversarially manipulated
OOD images. The code and adversarially created datasets will be made publicly
available.
Related papers
- CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance [14.849943391904882]
We propose CLIP-PING: Contrastive Language-Image Pre-training with Proximus Intrinsic Neighbors Guidance.
CLIP-PING bootstraps unimodal features extracted from arbitrary pre-trained encoders to obtain intrinsic guidance of proximus neighbors.
Experiments reveal that CLIP-PING notably surpasses its peers in zero-shot generalization and cross-modal retrieval tasks.
arXiv Detail & Related papers (2024-12-05T04:58:28Z) - LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies [22.100031612580356]
We tackle the challenge of predicting models' Out-of-Distribution (OOD) performance using in-distribution (ID) measurements without requiring OOD data.
We introduce the Lowest Common Ancestor (LCA)-on-the-Line framework, which measures the hierarchical distance between labels and predictions within a predefined class hierarchy.
arXiv Detail & Related papers (2024-07-22T21:54:19Z) - SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation [5.590633742488972]
Out-of-distribution (OOD) detection is crucial for the safe deployment of neural networks.
We propose SeTAR, a training-free OOD detection method.
SeTAR enhances OOD detection via post-hoc modification of the model's weight matrices using a simple greedy search algorithm.
Our work offers a scalable, efficient solution for OOD detection, setting a new state-of-the-art in this area.
arXiv Detail & Related papers (2024-06-18T13:55:13Z) - Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection [71.93411099797308]
Out-of-distribution (OOD) samples are crucial when deploying machine learning models in open-world scenarios.
We propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to potential Outlier Exposure, termed EOE.
EOE can be generalized to different tasks, including far, near, and fine-language OOD detection.
EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-06-02T17:09:48Z) - Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Prompt-based Learning for Unpaired Image Captioning [86.44188293709307]
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs.
Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning.
We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability.
arXiv Detail & Related papers (2022-05-26T03:13:43Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - $k$Folden: $k$-Fold Ensemble for Out-Of-Distribution Detection [31.10536251430344]
Out-of-Distribution (OOD) detection is an important problem in natural language processing (NLP)
We propose a framework $k$Folden, which mimics the behaviors of OOD detection during training without the use of any external data.
arXiv Detail & Related papers (2021-08-29T01:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.