ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
- URL: http://arxiv.org/abs/2410.11019v1
- Date: Mon, 14 Oct 2024 19:14:49 GMT
- Title: ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera
- Authors: Jing Liang, He Yin, Xuewei Qi, Jong Jin Park, Min Sun, Rajasimman Madhivanan, Dinesh Manocha,
- Abstract summary: We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera.
Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions.
- Score: 53.20087549782785
- License:
- Abstract: We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera. Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions. By designing a triplane-based deformable attention mechanism, our approach improves geometric understanding of the scene than other SOTA approaches and reduces noise in semantic predictions. Additionally, through the use of a Conditional Variational AutoEncoder (CVAE), we estimate the uncertainties of these predictions. The generated semantic and uncertainty maps will aid in the formulation of navigation strategies that facilitate safe and permissible decision-making in the future. Evaluated on the Semantic-KITTI dataset, ET-Former achieves the highest IoU and mIoU, surpassing other methods by 15.16% in IoU and 24.24% in mIoU, while reducing GPU memory usage of existing methods by 25%-50.5%.
Related papers
- MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies [10.662778683303726]
We propose MetaOcc, a novel multi-modal occupancy prediction framework.
We first design a height self-attention module for effective 3D feature extraction from sparse radar points.
Finally, we develop a semi-supervised training procedure leveraging open-set segmentor and geometric constraints for pseudo-label generation.
arXiv Detail & Related papers (2025-01-26T03:51:56Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Context-Conditioned Spatio-Temporal Predictive Learning for Reliable V2V Channel Prediction [25.688521281119037]
Vehicle-to-Vehicle (V2V) channel state information (CSI) prediction is challenging and crucial for optimizing downstream tasks.
Traditional prediction approaches focus on four-dimensional (4D) CSI, which includes predictions over time, bandwidth, and antenna (TX and RX) space.
We propose a novel context-conditionedtemporal predictive learning method to capture dependencies within 4D CSI data.
arXiv Detail & Related papers (2024-09-16T04:15:36Z) - Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving [45.886941596233974]
LiDAR-generated occupancy grid maps (L-OGMs) offer a robust bird's-eye view for the scene representation.
Our proposed framework performs L-OGM prediction in the latent space of a generative architecture.
We decode predictions using either a single-step decoder, which provides high-quality predictions in real-time, or a diffusion-based batch decoder.
arXiv Detail & Related papers (2024-07-30T18:37:59Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Camera-based 3D Semantic Scene Completion with Sparse Guidance Network [18.415854443539786]
We propose a camera-based semantic scene completion framework called SGN.
SGN propagates semantics from semantic-aware seed voxels to the whole scene based on spatial geometry cues.
Our experimental results demonstrate the superiority of our SGN over existing state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T04:17:27Z) - Exploiting Diffusion Prior for Generalizable Dense Prediction [85.4563592053464]
Recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate.
We introduce DMP, a pipeline utilizing pre-trained T2I models as a prior for dense prediction tasks.
Despite limited-domain training data, the approach yields faithful estimations for arbitrary images, surpassing existing state-of-the-art algorithms.
arXiv Detail & Related papers (2023-11-30T18:59:44Z) - Volumetric Semantically Consistent 3D Panoptic Mapping [77.13446499924977]
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating semantic 3D maps suitable for autonomous agents in unstructured environments.
It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions.
The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics.
arXiv Detail & Related papers (2023-09-26T08:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.