Related papers: Training-Free Out-Of-Distribution Segmentation With Foundation Models

Training-Free Out-Of-Distribution Segmentation With Foundation Models

URL: http://arxiv.org/abs/2510.02909v1
Date: Fri, 03 Oct 2025 11:27:40 GMT
Title: Training-Free Out-Of-Distribution Segmentation With Foundation Models
Authors: Laith Nayal, Hadi Salloum, Ahmad Taha, Yaroslav Kholodov, Alexander Gasnikov,
Abstract summary: Large vision foundation models, includ- ing DINOv2, InternImage, and CLIP, have advanced visual representation learn- ing by providing rich features that generalize well across diverse tasks.<n>We propose a training-free approach that utilizes features from the InternImage backbone and applies K-Means clustering alongside confidence thresholding on raw decoder logits to identify OoD clusters.<n>Our method achieves 50.02 Average Precision on the RoadAnomaly benchmark and 48.77 on the benchmark of ADE-OoD with InternImage-L, surpassing several supervised and unsupervised baselines.
Score: 38.00668980035719
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detecting unknown objects in semantic segmentation is crucial for safety-critical applications such as autonomous driving. Large vision foundation models, includ- ing DINOv2, InternImage, and CLIP, have advanced visual representation learn- ing by providing rich features that generalize well across diverse tasks. While their strength in closed-set semantic tasks is established, their capability to detect out- of-distribution (OoD) regions in semantic segmentation remains underexplored. In this work, we investigate whether foundation models fine-tuned on segmen- tation datasets can inherently distinguish in-distribution (ID) from OoD regions without any outlier supervision. We propose a simple, training-free approach that utilizes features from the InternImage backbone and applies K-Means clustering alongside confidence thresholding on raw decoder logits to identify OoD clusters. Our method achieves 50.02 Average Precision on the RoadAnomaly benchmark and 48.77 on the benchmark of ADE-OoD with InternImage-L, surpassing several supervised and unsupervised baselines. These results suggest a promising direc- tion for generic OoD segmentation methods that require minimal assumptions or additional data.

Related papers

Model Specific Task Similarity for Vision Language Model Selection via Layer Conductance [92.72779885657373]
We propose a framework that grounds model selection in the internal functional dynamics of the visual encoder.<n>Our approach represents each task via layer wise conductance and derives a target-conditioned block importance distribution through entropy regularized alignment.<n>Building on this, we introduce Directional Conductance Divergence (DCD), an asymmetric metric that quantifies how effectively a source task covers the target's salient functional blocks.
arXiv Detail & Related papers (2026-02-01T17:29:43Z)
ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction [57.930531826380836]
This work explores whether a foundational segmentation model can address label scarcity in the pixel-level vision task as an annotator for unlabeled images.<n>We propose ConformalSAM, a novel SSSS framework which first calibrates the foundation model using the target domain's labeled data and then filters out unreliable pixel labels of unlabeled data.
arXiv Detail & Related papers (2025-07-21T17:02:57Z)
Multi-Scale Foreground-Background Confidence for Out-of-Distribution Segmentation [0.36832029288386137]
We present a multi-scale OOD segmentation method that exploits the confidence information of a foreground-background segmentation model.<n>We consider the per pixel confidence score of the model prediction which is close to 1 for a pixel in a foreground object.<n>By aggregating these confidence values for different sized patches, objects of various sizes can be identified in a single image.
arXiv Detail & Related papers (2024-12-22T12:09:27Z)
A Likelihood Ratio-Based Approach to Segmenting Unknown Objects [4.000869978312742]
Outlier supervision is a widely used strategy for improving OoD detection of the existing segmentation networks. We propose an adaptive, lightweight unknown estimation module (UEM) for outlier supervision. Our approach achieves a new state-of-the-art across multiple datasets, outperforming the previous best method by 5.74% average precision points.
arXiv Detail & Related papers (2024-09-10T11:10:32Z)
Lidar Panoptic Segmentation and Tracking without Bells and Whistles [48.078270195629415]
We propose a detection-centric network for lidar segmentation and tracking. One of the core components of our network is the object instance detection branch. We evaluate our method on several 3D/4D LPS benchmarks and observe that our model establishes a new state-of-the-art among open-sourced models.
arXiv Detail & Related papers (2023-10-19T04:44:43Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Cross-Modal Fine-Tuning: Align then Refine [83.37294254884446]
ORCA is a cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. We show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities.
arXiv Detail & Related papers (2023-02-11T16:32:28Z)
Self-supervised Pre-training for Semantic Segmentation in an Indoor Scene [8.357801312689622]
We propose RegConsist, a method for self-supervised pre-training of a semantic segmentation model. We use a variant of contrastive learning to train a DCNN model for predicting semantic segmentation from RGB views in the target environment. The proposed method outperforms models pre-trained on ImageNet and achieves competitive performance when using models that are trained for exactly the same task but on a different dataset.
arXiv Detail & Related papers (2022-10-04T20:10:14Z)
Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain. We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one. Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.