Related papers: Diffusion-FS: Multimodal Free-Space Prediction via Diffusion for Autonomous Driving

Diffusion-FS: Multimodal Free-Space Prediction via Diffusion for Autonomous Driving

URL: http://arxiv.org/abs/2507.18763v1
Date: Thu, 24 Jul 2025 19:30:55 GMT
Title: Diffusion-FS: Multimodal Free-Space Prediction via Diffusion for Autonomous Driving
Authors: Keshav Gupta, Tejas S. Stanley, Pranjal Paul, Arun K. Singh, K. Madhava Krishna,
Abstract summary: Drivable free-space prediction is a fundamental and crucial problem in autonomous driving.<n>Recent works have addressed the problem by representing the entire non-obstacle road regions as the free-space.<n>Our aim is to estimate the driving corridors that are a navigable subset of the entire road region.
Score: 7.667821982085968
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Drivable Free-space prediction is a fundamental and crucial problem in autonomous driving. Recent works have addressed the problem by representing the entire non-obstacle road regions as the free-space. In contrast our aim is to estimate the driving corridors that are a navigable subset of the entire road region. Unfortunately, existing corridor estimation methods directly assume a BEV-centric representation, which is hard to obtain. In contrast, we frame drivable free-space corridor prediction as a pure image perception task, using only monocular camera input. However such a formulation poses several challenges as one doesn't have the corresponding data for such free-space corridor segments in the image. Consequently, we develop a novel self-supervised approach for free-space sample generation by leveraging future ego trajectories and front-view camera images, making the process of visual corridor estimation dependent on the ego trajectory. We then employ a diffusion process to model the distribution of such segments in the image. However, the existing binary mask-based representation for a segment poses many limitations. Therefore, we introduce ContourDiff, a specialized diffusion-based architecture that denoises over contour points rather than relying on binary mask representations, enabling structured and interpretable free-space predictions. We evaluate our approach qualitatively and quantitatively on both nuScenes and CARLA, demonstrating its effectiveness in accurately predicting safe multimodal navigable corridors in the image.

Related papers

Unsupervised Region-Based Image Editing of Denoising Diffusion Models [50.005612464340246]
We propose a method to identify semantic attributes in the latent space of pre-trained diffusion models without any further training.<n>Our approach facilitates precise semantic discovery and control over local masked areas, eliminating the need for annotations.
arXiv Detail & Related papers (2024-12-17T13:46:12Z)
Homography Guided Temporal Fusion for Road Line and Marking Segmentation [73.47092021519245]
Road lines and markings are frequently occluded in the presence of moving vehicles, shadow, and glare. We propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues. We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy.
arXiv Detail & Related papers (2024-04-11T10:26:40Z)
BEVSeg2TP: Surround View Camera Bird's-Eye-View Based Joint Vehicle Segmentation and Ego Vehicle Trajectory Prediction [4.328789276903559]
Trajectory prediction is a key task for vehicle autonomy. There is a growing interest in learning-based trajectory prediction. We show that there is the potential to improve the performance of perception.
arXiv Detail & Related papers (2023-12-20T15:02:37Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view. Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z)
Dfferentiable Raycasting for Self-supervised Occupancy Forecasting [52.61762537741392]
Motion planning for autonomous driving requires learning how the environment around an ego-vehicle evolves with time. In this paper, we use geometric occupancy as a natural alternative to view-dependent representations such as freespace. Our key insight is to use differentiable raycasting to "render" future occupancy predictions into future LiDAR sweep predictions.
arXiv Detail & Related papers (2022-10-04T21:35:21Z)
Unsupervised Foggy Scene Understanding via Self Spatial-Temporal Label Diffusion [51.11295961195151]
We exploit the characteristics of the foggy image sequence of driving scenes to densify the confident pseudo labels. Based on the two discoveries of local spatial similarity and adjacent temporal correspondence of the sequential image data, we propose a novel Target-Domain driven pseudo label Diffusion scheme. Our scheme helps the adaptive model achieve 51.92% and 53.84% mean intersection-over-union (mIoU) on two publicly available natural foggy datasets.
arXiv Detail & Related papers (2022-06-10T05:16:50Z)
NMR: Neural Manifold Representation for Autonomous Driving [2.2596039727344452]
We propose a representation for autonomous driving that learns to infer semantics and predict way-points on a manifold over a finite horizon. We do this using an iterative attention mechanism applied on a latent high dimensional embedding of surround monocular images and partial ego-vehicle state. We propose a sampling algorithm based on edge-adaptive coverage loss of BEV occupancy grid to generate the surface manifold.
arXiv Detail & Related papers (2022-05-11T14:58:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.