Related papers: Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting

Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting

URL: http://arxiv.org/abs/2504.01957v2
Date: Thu, 03 Apr 2025 07:01:32 GMT
Title: Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting
Authors: Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen,
Abstract summary: Bird's-eye view (BEV) perception has gained significant attention because it provides a unified representation to fuse multiple view images.<n>Recent state-of-the-art models utilize projection-based methods which formulate BEV perception as query learning to bypass explicit depth estimation.<n>We introduce a novel uncertainty-aware BEV perception framework that revisits unprojection-based methods and enhances them with depth un-certainty modeling.
Score: 25.43661979008945
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bird's-eye view (BEV) perception has gained significant attention because it provides a unified representation to fuse multiple view images and enables a wide range of down-stream autonomous driving tasks, such as forecasting and planning. Recent state-of-the-art models utilize projection-based methods which formulate BEV perception as query learning to bypass explicit depth estimation. While we observe promising advancements in this paradigm, they still fall short of real-world applications because of the lack of uncertainty modeling and expensive computational requirement. In this work, we introduce GaussianLSS, a novel uncertainty-aware BEV perception framework that revisits unprojection-based methods, specifically the Lift-Splat-Shoot (LSS) paradigm, and enhances them with depth un-certainty modeling. GaussianLSS represents spatial dispersion by learning a soft depth mean and computing the variance of the depth distribution, which implicitly captures object extents. We then transform the depth distribution into 3D Gaussians and rasterize them to construct uncertainty-aware BEV features. We evaluate GaussianLSS on the nuScenes dataset, achieving state-of-the-art performance compared to unprojection-based methods. In particular, it provides significant advantages in speed, running 2.5x faster, and in memory efficiency, using 0.3x less memory compared to projection-based methods, while achieving competitive performance with only a 0.4% IoU difference.

Related papers

ProBA: Probabilistic Bundle Adjustment with the Bhattacharyya Coefficient [43.75661586211106]
ProBA explicitly models and propagates uncertainty in the 2D observations and the 3D scene structure.<n>Our method uses 3D Gaussians instead of point-like landmarks.<n>ProBA enhances the practicality of SLAM systems deployed in unstructured environments.
arXiv Detail & Related papers (2025-05-27T08:07:00Z)
LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation [10.434754671492723]
We propose LSSInst, a two-stage object detector incorporating BEV and instance representations in tandem. The proposed detector exploits fine-grained pixel-level features that can be flexibly integrated into existing LSS-based BEV networks. Our proposed framework is of excellent generalization ability and performance, which boosts the performances of modern LSS-based BEV perception methods without bells and whistles.
arXiv Detail & Related papers (2024-11-09T13:03:54Z)
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices. Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z)
Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis [53.702118455883095]
We propose a novel method for synthesizing novel views from sparse views with Gaussian Splatting. Our key idea lies in exploring the self-supervisions inherent in the binocular stereo consistency between each pair of binocular images. Our method significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-24T15:10:27Z)
FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection [33.225938984092274]
We propose a Foreground Self-Distillation (FSD) scheme that effectively avoids the issue of distribution discrepancies. We also design two Point Cloud Intensification ( PCI) strategies to compensate for the sparsity of point clouds. We develop a Multi-Scale Foreground Enhancement (MSFE) module to extract and fuse multi-scale foreground features.
arXiv Detail & Related papers (2024-07-14T09:39:44Z)
Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian [49.21866794516328]
3D Gaussian splatting has demonstrated impressive performance in real-time novel view synthesis. Previous approaches have incorporated depth supervision into the training of 3D Gaussians to mitigate overfitting. We introduce a novel method to supervise the depth distribution of 3D Gaussians, utilizing depth priors with integrated uncertainty estimates.
arXiv Detail & Related papers (2024-05-30T03:18:30Z)
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms. We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction. Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z)
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images. We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z)
Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs) Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV. Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z)
The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation [42.48819460873482]
Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions.
arXiv Detail & Related papers (2023-06-02T21:26:20Z)
BEV-IO: Enhancing Bird's-Eye-View 3D Detection with Instance Occupancy [58.92659367605442]
We present BEV-IO, a new 3D detection paradigm to enhance BEV representation with instance occupancy information. We show that BEV-IO can outperform state-of-the-art methods while only adding a negligible increase in parameters and computational overhead.
arXiv Detail & Related papers (2023-05-26T11:16:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.