RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection
- URL: http://arxiv.org/abs/2507.19856v3
- Date: Sat, 08 Nov 2025 15:42:10 GMT
- Title: RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection
- Authors: Xiaokai Bai, Chenxu Zhou, Lianqing Zheng, Si-Yuan Cao, Jianan Liu, Xiaohan Zhang, Yiming Li, Zhengzhuang Zhang, Hui-liang Shen,
- Abstract summary: We propose RaGS, a framework that leverages 3D Gaussian Splatting to fuse 4D radar and monocular cues for 3D object detection.<n>RaGS achieves object-centric precision and comprehensive scene perception.
- Score: 22.546559563539272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 4D millimeter-wave radar is a promising sensing modality for autonomous driving, yet effective 3D object detection from 4D radar and monocular images remains challenging. Existing fusion approaches either rely on instance proposals lacking global context or dense BEV grids constrained by rigid structures, lacking a flexible and adaptive representation for diverse scenes. To address this, we propose RaGS, the first framework that leverages 3D Gaussian Splatting (GS) to fuse 4D radar and monocular cues for 3D object detection. 3D GS models the scene as a continuous field of Gaussians, enabling dynamic resource allocation to foreground objects while maintaining flexibility and efficiency. Moreover, the velocity dimension of 4D radar provides motion cues that help anchor and refine the spatial distribution of Gaussians. Specifically, RaGS adopts a cascaded pipeline to construct and progressively refine the Gaussian field. It begins with Frustum-based Localization Initiation (FLI), which unprojects foreground pixels to initialize coarse Gaussian centers. Then, Iterative Multimodal Aggregation (IMA) explicitly exploits image semantics and implicitly integrates 4D radar velocity geometry to refine the Gaussians within regions of interest. Finally, Multi-level Gaussian Fusion (MGF) renders the Gaussian field into hierarchical BEV features for 3D object detection. By dynamically focusing on sparse and informative regions, RaGS achieves object-centric precision and comprehensive scene perception. Extensive experiments on View-of-Delft, TJ4DRadSet, and OmniHD-Scenes demonstrate its robustness and SOTA performance. Code will be released.
Related papers
- Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking [17.16370461224889]
4Dtemporal tracking is crucial for the safe and reliable operation of robots in dynamic environments.<n>In this paper we present Latent Gaussian splatting for 4D panoptic occupancy tracking.<n>We make code available at https://lags.cs.uni-freiburg.de/.
arXiv Detail & Related papers (2026-02-26T16:34:49Z) - Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting [52.18697134979677]
Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS)<n>Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality.<n>We introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity.<n>Our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate 43.7x speedup on 512-D feature maps.
arXiv Detail & Related papers (2025-12-24T04:16:18Z) - Rad-GS: Radar-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments [7.035816744210293]
Rad-GS is a 4D radar-camera SLAM system designed for kilometer-scale outdoor environments.<n>We show that Rad-GS achieves performance comparable to traditional 3D Gaussian methods based on camera or LiDAR inputs.
arXiv Detail & Related papers (2025-11-20T06:32:46Z) - RadarGaussianDet3D: An Efficient and Effective Gaussian-based 3D Detector with 4D Automotive Radars [7.212370688384851]
4D automotive radars have gained increasing attention for autonomous driving due to their low cost, robustness, and inherent velocity measurement capability.<n>Existing 4D radar-based 3D detectors rely heavily on pillar encoders for BEV feature extraction, where each point contributes to only a single BEV grid.<n>This paper introduces an efficient and effective Gaussian-based 3D detector, namely RadarGaussianDet3D.<n>Experiments on TJ4DRadSet and View-of-Delft demonstrate that RadarGaussianDet3D achieves state-of-the-art detection accuracy while delivering
arXiv Detail & Related papers (2025-09-19T16:13:09Z) - LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation [56.4321049923868]
3D Gaussian Splatting (3DGS) has emerged as a novel explicit representation for 3D scenes, offering both high-fidelity reconstruction and efficient rendering.<n>We propose Label-aware 3D Gaussian Splatting (LabelGS), a method that augments the Gaussian representation with object label.<n>LabelGS achieves a remarkable 22X speedup in training compared to Feature-3DGS, at a resolution of 1440X1080.
arXiv Detail & Related papers (2025-08-27T09:07:38Z) - Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation [2.7463268699570134]
4D Gaussian Splatting (4DGS) has emerged as an appealing approach due to its ability to model high-fidelity spatial and temporal variations.<n>We introduce hybrid 3D-4D Gaussian Splatting (3D-4DGS), a novel framework that adaptively represents static regions with 3D Gaussians while reserving 4D Gaussians for dynamic elements.<n>Our method achieves significantly faster training times compared to baseline 4D Gaussian Splatting methods while maintaining or improving the visual quality.
arXiv Detail & Related papers (2025-05-19T14:59:58Z) - GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention [15.890744831541452]
3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving.<n>We propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention.
arXiv Detail & Related papers (2025-05-15T20:05:08Z) - DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation [37.09199962653554]
We present DirectTriGS, a novel framework designed for 3D object generation with Gaussian Splatting (GS)<n>The proposed generation framework can produce high-quality 3D object geometry and rendering results in the text-to-3D task.
arXiv Detail & Related papers (2025-03-10T04:05:38Z) - GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency.<n>Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z) - GaussianAD: Gaussian-Centric End-to-End Autonomous Driving [23.71316979650116]
Vision-based autonomous driving shows great potential due to its satisfactory performance and low costs.<n>Most existing methods adopt dense representations (e.g., bird's eye view) or sparse representations (e.g., instance boxes) for decision-making.<n>This paper explores a Gaussian-centric end-to-end autonomous driving framework and exploits 3D semantic Gaussians to extensively yet sparsely describe the scene.
arXiv Detail & Related papers (2024-12-13T18:59:30Z) - 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection [12.14595005884025]
This paper introduces 3DGS into 3DOD for the first time, identifying two main challenges.
We propose an elegant and efficient solution by incorporating 2D Boundary Guidance.
We also propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces.
arXiv Detail & Related papers (2024-10-02T15:15:52Z) - PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting [59.277480452459315]
We propose a principled sensitivity pruning score that preserves visual fidelity and foreground details at significantly higher compression ratios.<n>We also propose a multi-round prune-refine pipeline that can be applied to any pretrained 3D-GS model without changing its training pipeline.
arXiv Detail & Related papers (2024-06-14T17:53:55Z) - $\ extit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving [82.82048452755394]
Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving.
Most existing street 3DGS methods require tracked 3D vehicle bounding boxes to decompose the static and dynamic elements.
We propose a self-supervised street Gaussian ($textitS3$Gaussian) method to decompose dynamic and static elements from 4D consistency.
arXiv Detail & Related papers (2024-05-30T17:57:08Z) - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene.
We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians.
GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z) - Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting.
Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians.
We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z) - SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.56357905500512]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis.<n>We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS.<n>Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z) - Gaussian Grouping: Segment and Edit Anything in 3D Scenes [65.49196142146292]
We propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.
Compared to the implicit NeRF representation, we show that the grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency.
arXiv Detail & Related papers (2023-12-01T17:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.