Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
- URL: http://arxiv.org/abs/2602.23172v1
- Date: Thu, 26 Feb 2026 16:34:49 GMT
- Title: Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
- Authors: Maximilian Luz, Rohit Mohan, Thomas Nürnberg, Yakov Miron, Daniele Cattaneo, Abhinav Valada,
- Abstract summary: 4Dtemporal tracking is crucial for the safe and reliable operation of robots in dynamic environments.<n>In this paper we present Latent Gaussian splatting for 4D panoptic occupancy tracking.<n>We make code available at https://lags.cs.uni-freiburg.de/.
- Score: 17.16370461224889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Capturing 4D spatiotemporal surroundings is crucial for the safe and reliable operation of robots in dynamic environments. However, most existing methods address only one side of the problem: they either provide coarse geometric tracking via bounding boxes, or detailed 3D structures like voxel-based occupancy that lack explicit temporal association. In this work, we present Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking (LaGS) that advances spatiotemporal scene understanding in a holistic direction. Our approach incorporates camera-based end-to-end tracking with mask-based multi-view panoptic occupancy prediction, and addresses the key challenge of efficiently aggregating multi-view information into 3D voxel grids via a novel latent Gaussian splatting approach. Specifically, we first fuse observations into 3D Gaussians that serve as a sparse point-centric latent representation of the 3D scene, and then splat the aggregated features onto a 3D voxel grid that is decoded by a mask-based segmentation head. We evaluate LaGS on the Occ3D nuScenes and Waymo datasets, achieving state-of-the-art performance for 4D panoptic occupancy tracking. We make our code available at https://lags.cs.uni-freiburg.de/.
Related papers
- Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels [67.36972154532761]
Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos.<n>Recent monocular 3D tracking works demonstrate impressive performance, but are limited to either tracking sparse points on the first frame or a slow optimization-based framework for dense tracking.<n>We propose a feedforward model, called Track4World, enabling an efficient holistic 3D tracking of every pixel in the world-centric coordinate system.
arXiv Detail & Related papers (2026-03-03T03:45:43Z) - RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection [22.546559563539272]
We propose RaGS, a framework that leverages 3D Gaussian Splatting to fuse 4D radar and monocular cues for 3D object detection.<n>RaGS achieves object-centric precision and comprehensive scene perception.
arXiv Detail & Related papers (2025-07-26T08:17:12Z) - Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps [13.325879149065008]
3D Gaussian Splatting (3DGS) has become a popular solution in SLAM due to its high-fidelity synthesis and real-time novel view performance.<n>Previous 3DGS SLAM methods employ a differentiable rendering pipeline for tracking, lack geometric priors in outdoor scenes.<n>We propose a robust RGB-only outdoor 3DGS SLAM method: S3PO-GS. Technically, we establish a self-consistent tracking module anchored in the 3DGS pointmap, which avoids cumulative scale drift and achieves more precise and robust tracking with fewer iterations.
arXiv Detail & Related papers (2025-07-04T17:56:43Z) - H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting [39.2960379257236]
Dynamic scene reconstruction poses a persistent challenge in 3D vision.<n>Deformable 3D Gaussian splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity.<n>This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion.<n>Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques.
arXiv Detail & Related papers (2024-08-23T12:51:49Z) - $\ extit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving [82.82048452755394]
Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving.
Most existing street 3DGS methods require tracked 3D vehicle bounding boxes to decompose the static and dynamic elements.
We propose a self-supervised street Gaussian ($textitS3$Gaussian) method to decompose dynamic and static elements from 4D consistency.
arXiv Detail & Related papers (2024-05-30T17:57:08Z) - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene.
We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians.
GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z) - LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field [13.815932949774858]
Cinemagraph is a form of visual media that combines elements of still photography and subtle motion to create a captivating experience.
We propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling.
Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation.
arXiv Detail & Related papers (2024-04-13T11:07:53Z) - HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem.
Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians.
Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z) - Oriented-grid Encoder for 3D Implicit Representations [10.02138130221506]
This paper is the first to exploit 3D characteristics in 3D geometric encoders explicitly.
Our method gets state-of-the-art results when compared to the prior techniques.
arXiv Detail & Related papers (2024-02-09T19:28:13Z) - SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.56357905500512]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis.<n>We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS.<n>Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.