ForecastOcc: Vision-based Semantic Occupancy Forecasting
- URL: http://arxiv.org/abs/2602.08006v1
- Date: Sun, 08 Feb 2026 15:16:06 GMT
- Title: ForecastOcc: Vision-based Semantic Occupancy Forecasting
- Authors: Riya Mohan, Juana Valeria Hurtado, Rohit Mohan, Abhinav Valada,
- Abstract summary: We present ForecastOcc, the first framework for vision-based semantic occupancy forecasting that predicts future occupancy states and semantic categories.<n>Our framework yields semantic occupancy forecasts for multiple horizons directly from past camera images, without relying on externally estimated maps.
- Score: 16.699381591572163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous driving requires forecasting both geometry and semantics over time to effectively reason about future environment states. Existing vision-based occupancy forecasting methods focus on motion-related categories such as static and dynamic objects, while semantic information remains largely absent. Recent semantic occupancy forecasting approaches address this gap but rely on past occupancy predictions obtained from separate networks. This makes current methods sensitive to error accumulation and prevents learning spatio-temporal features directly from images. In this work, we present ForecastOcc, the first framework for vision-based semantic occupancy forecasting that jointly predicts future occupancy states and semantic categories. Our framework yields semantic occupancy forecasts for multiple horizons directly from past camera images, without relying on externally estimated maps. We evaluate ForecastOcc in two complementary settings: multi-view forecasting on the Occ3D-nuScenes dataset and monocular forecasting on SemanticKITTI, where we establish the first benchmark for this task. We introduce the first baselines by adapting two 2D forecasting modules within our framework. Importantly, we propose a novel architecture that incorporates a temporal cross-attention forecasting module, a 2D-to-3D view transformer, a 3D encoder for occupancy prediction, and a semantic occupancy head for voxel-level forecasts across multiple horizons. Extensive experiments on both datasets show that ForecastOcc consistently outperforms baselines, yielding semantically rich, future-aware predictions that capture scene dynamics and semantics critical for autonomous driving.
Related papers
- GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving [12.889523014369884]
We propose a geometric and semantic self-supervised pre-training method, GASP, that learns a unified representation by predicting, at any queried future point in spacetime.<n>By modeling geometric and semantic 4D occupancy fields instead of raw sensor measurements, the model learns a structured, general representation of the environment and its evolution through time.
arXiv Detail & Related papers (2025-03-19T20:00:27Z) - ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions [91.55655961014027]
3D semantic occupancy and flow prediction are fundamental to understanding scene scene.<n>This paper proposes a vision-based framework with three targeted improvements.<n>Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint semantic-flow prediction.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Panoptic-Depth Forecasting [8.81078960241057]
We propose the panoptic-depth forecasting task for jointly predicting the panoptic segmentation and depth maps of unobserved future frames.
We extend the popular KITTI-360 and Cityscapes benchmarks by computing depth maps from LiDAR point clouds and leveraging labeled data.
We present two baselines and propose the novel PDcast architecture that learns rich-temporal representations by incorporating a transformer-based encoder, a forecasting module, and task-specific decoders.
arXiv Detail & Related papers (2024-09-18T14:21:07Z) - Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving [31.995016095663544]
LiDAR-generated occupancy grid maps (L-OGMs) offer a robust bird's-eye view for the scene representation.<n>Our proposed framework, Latent Occupancy Prediction (LOPR), performs L-OGM prediction in the latent space of a generative architecture.
arXiv Detail & Related papers (2024-07-30T18:37:59Z) - AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction [56.72301849123049]
We present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ dataset challenge at CVPR 2024.
Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling.
Our method combines regression with classification to address scale variations in different scenes, and leverages predicted flow to warp current voxel features to future frames, guided by future frame ground truth.
arXiv Detail & Related papers (2024-07-01T16:32:15Z) - Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - LOPR: Latent Occupancy PRediction using Generative Models [28.49346874213506]
LiDAR generated occupancy grid maps (L-OGMs) offer a robust bird's eye-view scene representation.<n>We propose a framework that decouples occupancy prediction into: representation learning and prediction within the learned latent space.
arXiv Detail & Related papers (2022-10-03T22:04:00Z) - Predicting Future Occupancy Grids in Dynamic Environment with
Spatio-Temporal Learning [63.25627328308978]
We propose a-temporal prediction network pipeline to generate future occupancy predictions.
Compared to current SOTA, our approach predicts occupancy for a longer horizon of 3 seconds.
We publicly release our grid occupancy dataset based on nulis to support further research.
arXiv Detail & Related papers (2022-05-06T13:45:32Z) - Joint Forecasting of Panoptic Segmentations with Difference Attention [72.03470153917189]
We study a new panoptic segmentation forecasting model that jointly forecasts all object instances in a scene.
We evaluate the proposed model on the Cityscapes and AIODrive datasets.
arXiv Detail & Related papers (2022-04-14T17:59:32Z) - Panoptic Segmentation Forecasting [71.75275164959953]
Our goal is to forecast the near future given a set of recent observations.
We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents.
We develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things.
arXiv Detail & Related papers (2021-04-08T17:59:16Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.