Panoptic-Depth Forecasting
- URL: http://arxiv.org/abs/2409.12008v1
- Date: Wed, 18 Sep 2024 14:21:07 GMT
- Title: Panoptic-Depth Forecasting
- Authors: Juana Valeria Hurtado, Riya Mohan, Abhinav Valada,
- Abstract summary: We propose the panoptic-depth forecasting task for jointly predicting the panoptic segmentation and depth maps of unobserved future frames.
We extend the popular KITTI-360 and Cityscapes benchmarks by computing depth maps from LiDAR point clouds and leveraging labeled data.
We present two baselines and propose the novel PDcast architecture that learns rich-temporal representations by incorporating a transformer-based encoder, a forecasting module, and task-specific decoders.
- Score: 8.81078960241057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Forecasting the semantics and 3D structure of scenes is essential for robots to navigate and plan actions safely. Recent methods have explored semantic and panoptic scene forecasting; however, they do not consider the geometry of the scene. In this work, we propose the panoptic-depth forecasting task for jointly predicting the panoptic segmentation and depth maps of unobserved future frames, from monocular camera images. To facilitate this work, we extend the popular KITTI-360 and Cityscapes benchmarks by computing depth maps from LiDAR point clouds and leveraging sequential labeled data. We also introduce a suitable evaluation metric that quantifies both the panoptic quality and depth estimation accuracy of forecasts in a coherent manner. Furthermore, we present two baselines and propose the novel PDcast architecture that learns rich spatio-temporal representations by incorporating a transformer-based encoder, a forecasting module, and task-specific decoders to predict future panoptic-depth outputs. Extensive evaluations demonstrate the effectiveness of PDcast across two datasets and three forecasting tasks, consistently addressing the primary challenges. We make the code publicly available at https://pdcast.cs.uni-freiburg.de.
Related papers
- ForecastOcc: Vision-based Semantic Occupancy Forecasting [16.699381591572163]
We present ForecastOcc, the first framework for vision-based semantic occupancy forecasting that predicts future occupancy states and semantic categories.<n>Our framework yields semantic occupancy forecasts for multiple horizons directly from past camera images, without relying on externally estimated maps.
arXiv Detail & Related papers (2026-02-08T15:16:06Z) - Statistical Confidence Rescoring for Robust 3D Scene Graph Generation from Multi-View Images [56.134885746889026]
semantic scene graph estimation methods utilize ground truth 3D annotations to accurately predict target objects, predicates, and relationships.<n>We overcome the noisy reconstructed pseudo point-based geometry from predicted depth maps and reduce the amount of background noise present in multi-view image features.<n>Our method outperforms current methods purely using multi-view images as the initial input.
arXiv Detail & Related papers (2025-08-05T21:25:50Z) - Range-Agnostic Multi-View Depth Estimation With Keyframe Selection [33.99466211478322]
Methods for 3D reconstruction from posed frames require prior knowledge about the scene metric range.
RAMDepth is an efficient and purely 2D framework that reverses the depth estimation and matching steps order.
arXiv Detail & Related papers (2024-01-25T18:59:42Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Calibrating Panoramic Depth Estimation for Practical Localization and
Mapping [20.621442016969976]
The absolute depth values of surrounding environments provide crucial cues for various assistive technologies, such as localization, navigation, and 3D structure estimation.
We propose that accurate depth estimated from panoramic images can serve as a powerful and light-weight input for a wide range of downstream tasks requiring 3D information.
arXiv Detail & Related papers (2023-08-27T04:50:05Z) - Incremental 3D Semantic Scene Graph Prediction from RGB Sequences [86.77318031029404]
We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence.
Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.
The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
arXiv Detail & Related papers (2023-05-04T11:32:16Z) - How Far Can I Go ? : A Self-Supervised Approach for Deterministic Video
Depth Forecasting [23.134156184783357]
We present a novel self-supervised method to anticipate the depth estimate for a future, unobserved real-world urban scene.
This work is the first to explore self-supervised learning for estimation of monocular depth of future unobserved frames of a video.
arXiv Detail & Related papers (2022-07-01T15:51:17Z) - Joint Forecasting of Panoptic Segmentations with Difference Attention [72.03470153917189]
We study a new panoptic segmentation forecasting model that jointly forecasts all object instances in a scene.
We evaluate the proposed model on the Cityscapes and AIODrive datasets.
arXiv Detail & Related papers (2022-04-14T17:59:32Z) - P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior [133.76192155312182]
We propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth.
An extensive evaluation of our method shows that we set the new state of the art in supervised monocular depth estimation.
arXiv Detail & Related papers (2022-04-05T10:03:52Z) - Panoptic Segmentation Forecasting [71.75275164959953]
Our goal is to forecast the near future given a set of recent observations.
We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents.
We develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things.
arXiv Detail & Related papers (2021-04-08T17:59:16Z) - Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud
Forecasting for Sequential Pose Forecasting [106.3504366501894]
Self-driving vehicles and robotic manipulation systems often forecast future object poses by first detecting and tracking objects.
This detect-then-forecast pipeline is expensive to scale, as pose forecasting algorithms typically require labeled sequences of object poses.
We propose to first forecast 3D sensor data and then detect/track objects on the predicted point cloud sequences to obtain future poses.
This makes it less expensive to scale pose forecasting, as the sensor data forecasting task requires no labels.
arXiv Detail & Related papers (2020-03-18T17:54:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.