COOkeD: Ensemble-based OOD detection in the era of zero-shot CLIP
- URL: http://arxiv.org/abs/2507.22576v1
- Date: Wed, 30 Jul 2025 11:02:38 GMT
- Title: COOkeD: Ensemble-based OOD detection in the era of zero-shot CLIP
- Authors: Galadrielle Humblot-Renaux, Gianni Franchi, Sergio Escalera, Thomas B. Moeslund,
- Abstract summary: Out-of-distribution (OOD) detection is an important building block in trustworthy image recognition systems.<n>We show that given a little open-mindedness from both ends, remarkable OOD detection can be achieved by instead creating a heterogeneous ensemble.<n>COkeD achieves state-of-the-art performance and greater robustness compared to both classical and CLIP-based OOD detection methods.
- Score: 47.84776775118222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Out-of-distribution (OOD) detection is an important building block in trustworthy image recognition systems as unknown classes may arise at test-time. OOD detection methods typically revolve around a single classifier, leading to a split in the research field between the classical supervised setting (e.g. ResNet18 classifier trained on CIFAR100) vs. the zero-shot setting (class names fed as prompts to CLIP). In both cases, an overarching challenge is that the OOD detection performance is implicitly constrained by the classifier's capabilities on in-distribution (ID) data. In this work, we show that given a little open-mindedness from both ends, remarkable OOD detection can be achieved by instead creating a heterogeneous ensemble - COOkeD combines the predictions of a closed-world classifier trained end-to-end on a specific dataset, a zero-shot CLIP classifier, and a linear probe classifier trained on CLIP image features. While bulky at first sight, this approach is modular, post-hoc and leverages the availability of pre-trained VLMs, thus introduces little overhead compared to training a single standard classifier. We evaluate COOkeD on popular CIFAR100 and ImageNet benchmarks, but also consider more challenging, realistic settings ranging from training-time label noise, to test-time covariate shift, to zero-shot shift which has been previously overlooked. Despite its simplicity, COOkeD achieves state-of-the-art performance and greater robustness compared to both classical and CLIP-based OOD detection methods. Code is available at https://github.com/glhr/COOkeD
Related papers
- Buffer-free Class-Incremental Learning with Out-of-Distribution Detection [18.706435793435094]
Class-incremental learning (CIL) poses significant challenges in open-world scenarios.<n>We present an in-depth analysis of post-hoc OOD detection methods and investigate their potential to eliminate the need for a memory buffer.<n>We show that this buffer-free approach achieves comparable or superior performance to buffer-based methods both in terms of class-incremental learning and the rejection of unknown samples.
arXiv Detail & Related papers (2025-05-29T13:01:00Z) - Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection [6.4348035950413]
We present the first generation-based model using CLIP for zero-shot HOI detection, coined HOIGen.
We develop a CLIP-injected feature generator in accordance with the generation of human, object and union features.
To enrich the HOI scores, we construct a generative prototype bank in a pairwise HOI recognition branch, and a multi-knowledge prototype bank in an image-wise HOI recognition branch.
arXiv Detail & Related papers (2024-08-12T08:02:37Z) - Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection [71.93411099797308]
Out-of-distribution (OOD) samples are crucial when deploying machine learning models in open-world scenarios.
We propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to potential Outlier Exposure, termed EOE.
EOE can be generalized to different tasks, including far, near, and fine-language OOD detection.
EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-06-02T17:09:48Z) - CLIPScope: Enhancing Zero-Shot OOD Detection with Bayesian Scoring [16.0716584170549]
We introduce CLIPScope, a zero-shot OOD detection approach that normalizes the confidence score of a sample by class likelihoods.
CLIPScope incorporates a novel strategy to mine OOD classes from a large lexical database.
It selects class labels that are farthest and nearest to ID classes in terms of CLIP embedding distance to maximize coverage of OOD samples.
arXiv Detail & Related papers (2024-05-23T16:03:55Z) - A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation [121.0693322732454]
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity.
Recent research has focused on developing efficient fine-tuning methods to enhance CLIP's performance in downstream tasks.
We revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.
arXiv Detail & Related papers (2024-02-06T15:45:27Z) - Unified Classification and Rejection: A One-versus-All Framework [47.58109235690227]
We build a unified framework for building open set classifiers for both classification and OOD rejection.
By decomposing the $ K $-class problem into $ K $ one-versus-all (OVA) binary classification tasks, we show that combining the scores of OVA classifiers can give $ (K+1) $-class posterior probabilities.
Experiments on popular OSR and OOD detection datasets demonstrate that the proposed framework, using a single multi-class classifier, yields competitive performance.
arXiv Detail & Related papers (2023-11-22T12:47:12Z) - LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning [37.36999826208225]
We present a novel vision-language prompt learning approach for few-shot out-of-distribution (OOD) detection.
LoCoOp performs OOD regularization that utilizes the portions of CLIP local features as OOD features during training.
LoCoOp outperforms existing zero-shot and fully supervised detection methods.
arXiv Detail & Related papers (2023-06-02T06:33:08Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.