Related papers: Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection

Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection

URL: http://arxiv.org/abs/2303.05828v2
Date: Thu, 9 Nov 2023 10:23:29 GMT
Title: Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection
Authors: Nikolas Adaloglou and Felix Michels and Tim Kaiser and Markus Kollmann
Abstract summary: We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection. We propose a new simple and scalable method called textitpseudo-label probing (PLP) that adapts vision-language models for OOD detection.
Score: 1.597617022056624
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection, focusing on adapting contrastive language-image pretrained (CLIP) models. Without fine-tuning on the training data, we are able to establish a positive correlation ($R^2\geq0.92$) between in-distribution classification and unsupervised OOD detection for CLIP models in $4$ benchmarks. We further propose a new simple and scalable method called \textit{pseudo-label probing} (PLP) that adapts vision-language models for OOD detection. Given a set of label names of the training set, PLP trains a linear layer using the pseudo-labels derived from the text encoder of CLIP. To test the OOD detection robustness of pretrained models, we develop a novel feature-based adversarial OOD data manipulation approach to create adversarial samples. Intriguingly, we show that (i) PLP outperforms the previous state-of-the-art \citep{ming2022mcm} on all $5$ large-scale benchmarks based on ImageNet, specifically by an average AUROC gain of 3.4\% using the largest CLIP model (ViT-G), (ii) we show that linear probing outperforms fine-tuning by large margins for CLIP architectures (i.e. CLIP ViT-H achieves a mean gain of 7.3\% AUROC on average on all ImageNet-based benchmarks), and (iii) billion-parameter CLIP models still fail at detecting adversarially manipulated OOD images. The code and adversarially created datasets will be made publicly available.

Related papers

ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval [83.01358520910533]
We introduce a new framework that can boost the performance of large-scale pre-trained vision- curation models. The approach, Enhanced Language-Image Pre-training (ELIP), uses the text query, via a simple mapping network, to predict a set of visual prompts. ELIP can easily be applied to the commonly used CLIP, SigLIP and BLIP-2 networks.
arXiv Detail & Related papers (2025-02-21T18:59:57Z)
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance [14.849943391904882]
We propose CLIP-PING: Contrastive Language-Image Pre-training with Proximus Intrinsic Neighbors Guidance. CLIP-PING bootstraps unimodal features extracted from arbitrary pre-trained encoders to obtain intrinsic guidance of proximus neighbor samples. Experiments reveal that CLIP-PING notably surpasses its peers in zero-shot generalization and cross-modal retrieval tasks.
arXiv Detail & Related papers (2024-12-05T04:58:28Z)
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies [22.100031612580356]
We tackle the challenge of predicting models' Out-of-Distribution (OOD) performance using in-distribution (ID) measurements without requiring OOD data. We introduce the Lowest Common Ancestor (LCA)-on-the-Line framework, which measures the hierarchical distance between labels and predictions within a predefined class hierarchy.
arXiv Detail & Related papers (2024-07-22T21:54:19Z)
SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation [5.590633742488972]
Out-of-distribution (OOD) detection is crucial for the safe deployment of neural networks. We propose SeTAR, a training-free OOD detection method. SeTAR enhances OOD detection via post-hoc modification of the model's weight matrices using a simple greedy search algorithm. Our work offers a scalable, efficient solution for OOD detection, setting a new state-of-the-art in this area.
arXiv Detail & Related papers (2024-06-18T13:55:13Z)
Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection [71.93411099797308]
Out-of-distribution (OOD) samples are crucial when deploying machine learning models in open-world scenarios. We propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to potential Outlier Exposure, termed EOE. EOE can be generalized to different tasks, including far, near, and fine-language OOD detection. EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-06-02T17:09:48Z)
Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z)
Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models. Our method allows for effortless integration with existing models' training pipelines. On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z)
A framework for benchmarking class-out-of-distribution detection and its application to ImageNet [15.929238800072195]
We present a novel framework to benchmark the ability of image classifiers to detect classout-of-distribution instances. We apply this technique to ImageNet, and 525 pretrained, publicly available, ImageNet-1k classifiers.
arXiv Detail & Related papers (2023-02-23T09:57:48Z)
CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z)
Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images. MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z)
Prompt-based Learning for Unpaired Image Captioning [86.44188293709307]
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs. Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning. We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability.
arXiv Detail & Related papers (2022-05-26T03:13:43Z)
To be Critical: Self-Calibrated Weakly Supervised Learning for Salient Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations. We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions. We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z)
$k$Folden: $k$-Fold Ensemble for Out-Of-Distribution Detection [31.10536251430344]
Out-of-Distribution (OOD) detection is an important problem in natural language processing (NLP) We propose a framework $k$Folden, which mimics the behaviors of OOD detection during training without the use of any external data.
arXiv Detail & Related papers (2021-08-29T01:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.