Related papers: GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving

GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving

URL: http://arxiv.org/abs/2506.00034v1
Date: Tue, 27 May 2025 01:43:02 GMT
Title: GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving
Authors: Shuai Liu, Quanmin Liang, Zefeng Li, Boyang Li, Kai Huang,
Abstract summary: We introduce a Gaussian-based multi-sensor fusion framework for end-to-end autonomous driving.<n>Our method employs intuitive and compact Gaussian representations as intermediate carriers to aggregate information from diverse sensors.<n>The explicit features capture rich semantic and spatial information about the traffic scene, while the implicit features provide complementary cues for trajectory planning.
Score: 7.989953129185359
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-sensor fusion is crucial for improving the performance and robustness of end-to-end autonomous driving systems. Existing methods predominantly adopt either attention-based flatten fusion or bird's eye view fusion through geometric transformations. However, these approaches often suffer from limited interpretability or dense computational overhead. In this paper, we introduce GaussianFusion, a Gaussian-based multi-sensor fusion framework for end-to-end autonomous driving. Our method employs intuitive and compact Gaussian representations as intermediate carriers to aggregate information from diverse sensors. Specifically, we initialize a set of 2D Gaussians uniformly across the driving scene, where each Gaussian is parameterized by physical attributes and equipped with explicit and implicit features. These Gaussians are progressively refined by integrating multi-modal features. The explicit features capture rich semantic and spatial information about the traffic scene, while the implicit features provide complementary cues beneficial for trajectory planning. To fully exploit rich spatial and semantic information in Gaussians, we design a cascade planning head that iteratively refines trajectory predictions through interactions with Gaussians. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate the effectiveness and robustness of the proposed GaussianFusion framework. The source code will be released at https://github.com/Say2L/GaussianFusion.

Related papers

GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians [4.635245015125757]
3D semantic occupancy prediction is one of the crucial tasks of autonomous driving.<n>We propose a new approach to predict 3D semantic occupancy in complex environments.<n>We use semantic 3D Gaussians alongside an innovative sensor fusion mechanism.
arXiv Detail & Related papers (2025-07-24T15:46:38Z)
GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention [15.890744831541452]
3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving.<n>We propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention.
arXiv Detail & Related papers (2025-05-15T20:05:08Z)
ADGaussian: Generalizable Gaussian Splatting for Autonomous Driving with Multi-modal Inputs [32.896888952578806]
We present a novel approach, termed ADGaussian, for generalizable street scene reconstruction.<n>The proposed method enables high-quality rendering from single-view input.
arXiv Detail & Related papers (2025-04-01T05:40:23Z)
GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping [22.432252084121274]
LiDAR-Inertial-Visual (LIV) sensor configuration has demonstrated superior performance in localization and dense mapping.<n>We propose a novel real-time Gaussian-based simultaneous localization and mapping (SLAM) system.<n>The framework achieves real-time performance while maintaining robust multi-sensor fusion capabilities.
arXiv Detail & Related papers (2025-01-15T09:04:56Z)
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction [55.60972844777044]
3D semantic occupancy prediction is an important task for robust vision-centric autonomous driving.<n>Most existing methods leverage dense grid-based scene representations, overlooking the spatial sparsity of the driving scenes.<n>We propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied.
arXiv Detail & Related papers (2024-12-05T17:59:58Z)
SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images [91.28365943547703]
A novel generalizable Gaussian Splatting method, SmileSplat, is proposed to reconstruct pixel-aligned Gaussian surfels for diverse scenarios.<n>The proposed method achieves state-of-the-art performance in various 3D vision tasks.
arXiv Detail & Related papers (2024-11-27T05:52:28Z)
PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views [116.10577967146762]
PixelGaussian is an efficient framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Our method achieves state-of-the-art performance with good generalization to various numbers of views.
arXiv Detail & Related papers (2024-10-24T17:59:58Z)
Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos [58.22272760132996]
We show that existing 4D Gaussian methods dramatically fail in this setup because the monocular setting is underconstrained. We propose Dynamic Gaussian Marbles, which consist of three core modifications that target the difficulties of the monocular setting. We evaluate on the Nvidia Dynamic Scenes dataset and the DyCheck iPhone dataset, and show that Gaussian Marbles significantly outperforms other Gaussian baselines in quality.
arXiv Detail & Related papers (2024-06-26T19:37:07Z)
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene. We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians. GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z)
The Schr\"odinger Bridge between Gaussian Measures has a Closed Form [101.79851806388699]
We focus on the dynamic formulation of OT, also known as the Schr"odinger bridge (SB) problem. In this paper, we provide closed-form expressions for SBs between Gaussian measures.
arXiv Detail & Related papers (2022-02-11T15:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.