ClusterMine: Robust Label-Free Visual Out-Of-Distribution Detection via Concept Mining from Text Corpora
- URL: http://arxiv.org/abs/2511.07068v1
- Date: Mon, 10 Nov 2025 13:04:01 GMT
- Title: ClusterMine: Robust Label-Free Visual Out-Of-Distribution Detection via Concept Mining from Text Corpora
- Authors: Nikolas Adaloglou, Diana Petrusheva, Mohamed Asker, Felix Michels, Markus Kollmann,
- Abstract summary: ClusterMine is first method to achieve state-of-the-art OOD detection performance without access to positive labels.<n>It extracts positive concepts from a large text corpus by combining visual-only sample consistency (via clustering) and zero-shot image-text consistency.
- Score: 0.9416970967424899
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale visual out-of-distribution (OOD) detection has witnessed remarkable progress by leveraging vision-language models such as CLIP. However, a significant limitation of current methods is their reliance on a pre-defined set of in-distribution (ID) ground-truth label names (positives). These fixed label names can be unavailable, unreliable at scale, or become less relevant due to in-distribution shifts after deployment. Towards truly unsupervised OOD detection, we utilize widely available text corpora for positive label mining, bypassing the need for positives. In this paper, we utilize widely available text corpora for positive label mining under a general concept mining paradigm. Within this framework, we propose ClusterMine, a novel positive label mining method. ClusterMine is the first method to achieve state-of-the-art OOD detection performance without access to positive labels. It extracts positive concepts from a large text corpus by combining visual-only sample consistency (via clustering) and zero-shot image-text consistency. Our experimental study reveals that ClusterMine is scalable across a plethora of CLIP models and achieves state-of-the-art robustness to covariate in-distribution shifts. The code is available at https://github.com/HHU-MMBS/clustermine_wacv_official.
Related papers
- COOD: Concept-based Zero-shot OOD Detection [12.361461338978732]
We introduce COOD, a novel zero-shot multi-label OOD detection framework.
By enriching the semantic space with both positive and negative concepts for each label, our approach models complex label dependencies.
Our method significantly outperforms existing approaches, achieving approximately 95% average AUROC on both VOC and datasets.
arXiv Detail & Related papers (2024-11-15T08:15:48Z) - Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models [70.82728812001807]
A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool.
We theorize that enhancing performance requires expanding the semantic pool.
We show that expanding OOD label candidates with the CSP satisfies the requirements and outperforms existing works by 7.89% in FPR95.
arXiv Detail & Related papers (2024-10-11T08:24:11Z) - Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection [71.93411099797308]
Out-of-distribution (OOD) samples are crucial when deploying machine learning models in open-world scenarios.
We propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to potential Outlier Exposure, termed EOE.
EOE can be generalized to different tasks, including far, near, and fine-language OOD detection.
EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-06-02T17:09:48Z) - Negative Label Guided OOD Detection with Pretrained Vision-Language Models [96.67087734472912]
Out-of-distribution (OOD) detection aims at identifying samples from unknown classes.
We propose a novel post hoc OOD detection method, called NegLabel, which takes a vast number of negative labels from extensive corpus databases.
arXiv Detail & Related papers (2024-03-29T09:19:52Z) - Towards Improved Illicit Node Detection with Positive-Unlabelled
Learning [5.879542875341689]
We discuss the label mechanism assumption for the hidden positive labels and its effect on the evaluation metrics.
We show that PU classifiers dealing with potential hidden positive labels can have improved performance compared to regular machine learning models.
arXiv Detail & Related papers (2023-03-04T17:36:09Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Binary Classification with Positive Labeling Sources [71.37692084951355]
We propose WEAPO, a simple yet competitive WS method for producing training labels without negative labeling sources.
We show WEAPO achieves the highest averaged performance on 10 benchmark datasets.
arXiv Detail & Related papers (2022-08-02T19:32:08Z) - Positive Unlabeled Contrastive Learning [14.975173394072053]
We extend the self-supervised pretraining paradigm to the classical positive unlabeled (PU) setting.
We develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme.
Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets.
arXiv Detail & Related papers (2022-06-01T20:16:32Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - Semi-supervised Active Learning for Instance Segmentation via Scoring
Predictions [25.408505612498423]
We propose a novel and principled semi-supervised active learning framework for instance segmentation.
Specifically, we present an uncertainty sampling strategy named Triplet Scoring Predictions (TSP) to explicitly incorporate samples ranking clues from classes, bounding boxes and masks.
Results on medical images datasets demonstrate that the proposed method results in the embodiment of knowledge from available data in a meaningful way.
arXiv Detail & Related papers (2020-12-09T02:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.