Leveraging Text-Driven Semantic Variation for Robust OOD Segmentation
- URL: http://arxiv.org/abs/2511.07238v1
- Date: Mon, 10 Nov 2025 15:54:23 GMT
- Title: Leveraging Text-Driven Semantic Variation for Robust OOD Segmentation
- Authors: Seungheon Song, Jaekoo Lee,
- Abstract summary: We present a novel approach that trains a Text-Driven OOD model to learn a semantically diverse set of objects in the vision-language space.<n>By aligning visual and textual information, our approach effectively generalizes to unseen objects and provides robust OOD segmentation in diverse driving environments.
- Score: 2.4622211579286133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In autonomous driving and robotics, ensuring road safety and reliable decision-making critically depends on out-of-distribution (OOD) segmentation. While numerous methods have been proposed to detect anomalous objects on the road, leveraging the vision-language space-which provides rich linguistic knowledge-remains an underexplored field. We hypothesize that incorporating these linguistic cues can be especially beneficial in the complex contexts found in real-world autonomous driving scenarios. To this end, we present a novel approach that trains a Text-Driven OOD Segmentation model to learn a semantically diverse set of objects in the vision-language space. Concretely, our approach combines a vision-language model's encoder with a transformer decoder, employs Distance-Based OOD prompts located at varying semantic distances from in-distribution (ID) classes, and utilizes OOD Semantic Augmentation for OOD representations. By aligning visual and textual information, our approach effectively generalizes to unseen objects and provides robust OOD segmentation in diverse driving environments. We conduct extensive experiments on publicly available OOD segmentation datasets such as Fishyscapes, Segment-Me-If-You-Can, and Road Anomaly datasets, demonstrating that our approach achieves state-of-the-art performance across both pixel-level and object-level evaluations. This result underscores the potential of vision-language-based OOD segmentation to bolster the safety and reliability of future autonomous driving systems.
Related papers
- Mind the Way You Select Negative Texts: Pursuing the Distance Consistency in OOD Detection with VLMs [80.03370593724422]
Out-of-distribution (OOD) detection seeks to identify samples from unknown classes.<n>Current methods often incorporate intra-modal distance during OOD detection, such as comparing negative texts with ID labels.<n>We propose InterNeg, a framework that systematically utilizes consistent inter-modal distance enhancement from textual and visual perspectives.
arXiv Detail & Related papers (2026-03-03T05:44:47Z) - Vision-Based Natural Language Scene Understanding for Autonomous Driving: An Extended Dataset and a New Model for Traffic Scene Description Generation [1.3682156035049033]
This paper presents a framework that transforms a single frontal-view camera image into a concise natural language description.<n>It integrates spatial and semantic feature extraction to generate contextually rich and detailed scene descriptions.<n>The proposed model achieves strong performance and effectively fulfills its intended objectives on a newly developed dataset.
arXiv Detail & Related papers (2026-01-20T19:50:42Z) - CoT-Segmenter: Enhancing OOD Detection in Dense Road Scenes via Chain-of-Thought Reasoning [10.100430371132463]
We propose a novel Chain-of-Thought (CoT)-based framework targeting OOD detection in road anomaly scenes.<n>Our framework consistently outperforms state-of-the-art methods on both standard benchmarks and our newly defined challenging subset of the RoadAnomaly dataset.
arXiv Detail & Related papers (2025-07-05T10:23:40Z) - RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations [33.971901643313856]
RUNA is a novel framework for detecting out-of-distribution (OOD) objects.<n>It employs a regional uncertainty alignment mechanism to distinguish ID from OOD objects effectively.<n>Our experiments show that RUNA substantially surpasses state-of-the-art methods in object-level OOD detection.
arXiv Detail & Related papers (2025-03-28T10:01:55Z) - Generating Out-Of-Distribution Scenarios Using Language Models [58.47597351184034]
Large Language Models (LLMs) have shown promise in autonomous driving.
This paper introduces a framework for generating diverse Out-Of-Distribution (OOD) driving scenarios.
We evaluate our framework through extensive simulations and introduce a new "OOD-ness" metric.
arXiv Detail & Related papers (2024-11-25T16:38:17Z) - Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection [71.93411099797308]
Out-of-distribution (OOD) samples are crucial when deploying machine learning models in open-world scenarios.
We propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to potential Outlier Exposure, termed EOE.
EOE can be generalized to different tasks, including far, near, and fine-language OOD detection.
EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-06-02T17:09:48Z) - Driver Activity Classification Using Generalizable Representations from Vision-Language Models [0.0]
We present a novel approach leveraging generalizable representations from vision-language models for driver activity classification.
Our results suggest that vision-language representations offer a promising avenue for driver monitoring systems.
arXiv Detail & Related papers (2024-04-23T10:42:24Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Out-of-Domain Intent Detection Considering Multi-Turn Dialogue Contexts [91.43701971416213]
We introduce a context-aware OOD intent detection (Caro) framework to model multi-turn contexts in OOD intent detection tasks.
Caro establishes state-of-the-art performances on multi-turn OOD detection tasks by improving the F1-OOD score of over $29%$ compared to the previous best method.
arXiv Detail & Related papers (2023-05-05T01:39:21Z) - Improving Out-of-Distribution Detection with Disentangled Foreground and Background Features [23.266183020469065]
We propose a novel framework that disentangles foreground and background features from ID training samples via a dense prediction approach.
It is a generic framework that allows for a seamless combination with various existing OOD detection methods.
arXiv Detail & Related papers (2023-03-15T16:12:14Z) - Triggering Failures: Out-Of-Distribution detection by learning from
local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation.
Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA)
We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.