Road Obstacle Video Segmentation
- URL: http://arxiv.org/abs/2509.13181v1
- Date: Tue, 16 Sep 2025 15:34:43 GMT
- Title: Road Obstacle Video Segmentation
- Authors: Shyam Nandan Rai, Shyamgopal Karthik, Mariana-Iuliana Georgescu, Barbara Caputo, Carlo Masone, Zeynep Akata,
- Abstract summary: We demonstrate that the road-obstacle segmentation task is inherently temporal, since the segmentation maps for consecutive frames are strongly correlated.<n>Our approach establishes a new state-of-the-art in road-obstacle video segmentation for long-range video sequences.
- Score: 71.92123495914892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the growing deployment of autonomous driving agents, the detection and segmentation of road obstacles have become critical to ensure safe autonomous navigation. However, existing road-obstacle segmentation methods are applied on individual frames, overlooking the temporal nature of the problem, leading to inconsistent prediction maps between consecutive frames. In this work, we demonstrate that the road-obstacle segmentation task is inherently temporal, since the segmentation maps for consecutive frames are strongly correlated. To address this, we curate and adapt four evaluation benchmarks for road-obstacle video segmentation and evaluate 11 state-of-the-art image- and video-based segmentation methods on these benchmarks. Moreover, we introduce two strong baseline methods based on vision foundation models. Our approach establishes a new state-of-the-art in road-obstacle video segmentation for long-range video sequences, providing valuable insights and direction for future research.
Related papers
- Unsupervised Monocular Road Segmentation for Autonomous Driving via Scene Geometry [2.8647133890966994]
This paper presents a fully unsupervised approach for binary road segmentation (road vs. non-road)<n>The method leverages scene geometry and temporal cues to distinguish road from non-road regions.<n>On the Cityscapes dataset, the model achieves an Intersection-over-Union (IoU) of 0.82, demonstrating high accuracy with a simple design.
arXiv Detail & Related papers (2025-10-19T10:59:43Z) - TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving [52.25176274203747]
TopoStreamer is an end-to-end temporal perception model for lane segment topology reasoning.<n>TopoStreamer introduces three key improvements: streaming attribute constraints, dynamic lane boundary positional encoding, and lane segment denoising.<n>On the Open-Lane-V2 dataset, TopoStreamer demonstrates significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2025-07-01T12:10:46Z) - LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation [87.71768494466959]
LaneTCA bridges the individual video frames and explore how to effectively aggregate the temporal context.
We develop an accumulative attention module and an adjacent attention module to abstract the long-term and short-term temporal context.
The two modules are meticulously designed based on the transformer architecture.
arXiv Detail & Related papers (2024-08-25T14:46:29Z) - Homography Guided Temporal Fusion for Road Line and Marking Segmentation [73.47092021519245]
Road lines and markings are frequently occluded in the presence of moving vehicles, shadow, and glare.
We propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues.
We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy.
arXiv Detail & Related papers (2024-04-11T10:26:40Z) - PaRK-Detect: Towards Efficient Multi-Task Satellite Imagery Road
Extraction via Patch-Wise Keypoints Detection [12.145321599949236]
We propose a new scheme for multi-task satellite imagery road extraction, Patch-wise Road Keypoints Detection (PaRK-Detect)
Our framework predicts the position of patch-wise road keypoints and the adjacent relationships between them to construct road graphs in a single pass.
We evaluate our approach against the existing state-of-the-art methods on DeepGlobe, Massachusetts Roads, and RoadTracer datasets and achieve competitive or better results.
arXiv Detail & Related papers (2023-02-26T08:26:26Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - Motion-supervised Co-Part Segmentation [88.40393225577088]
We propose a self-supervised deep learning method for co-part segmentation.
Our approach develops the idea that motion information inferred from videos can be leveraged to discover meaningful object parts.
arXiv Detail & Related papers (2020-04-07T09:56:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.