Related papers: Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking

Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking

URL: http://arxiv.org/abs/2602.23172v1
Date: Thu, 26 Feb 2026 16:34:49 GMT
Title: Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
Authors: Maximilian Luz, Rohit Mohan, Thomas Nürnberg, Yakov Miron, Daniele Cattaneo, Abhinav Valada,
Abstract summary: 4Dtemporal tracking is crucial for the safe and reliable operation of robots in dynamic environments.<n>In this paper we present Latent Gaussian splatting for 4D panoptic occupancy tracking.<n>We make code available at https://lags.cs.uni-freiburg.de/.
Score: 17.16370461224889
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Capturing 4D spatiotemporal surroundings is crucial for the safe and reliable operation of robots in dynamic environments. However, most existing methods address only one side of the problem: they either provide coarse geometric tracking via bounding boxes, or detailed 3D structures like voxel-based occupancy that lack explicit temporal association. In this work, we present Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking (LaGS) that advances spatiotemporal scene understanding in a holistic direction. Our approach incorporates camera-based end-to-end tracking with mask-based multi-view panoptic occupancy prediction, and addresses the key challenge of efficiently aggregating multi-view information into 3D voxel grids via a novel latent Gaussian splatting approach. Specifically, we first fuse observations into 3D Gaussians that serve as a sparse point-centric latent representation of the 3D scene, and then splat the aggregated features onto a 3D voxel grid that is decoded by a mask-based segmentation head. We evaluate LaGS on the Occ3D nuScenes and Waymo datasets, achieving state-of-the-art performance for 4D panoptic occupancy tracking. We make our code available at https://lags.cs.uni-freiburg.de/.

Related papers

Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels [67.36972154532761]
Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos.<n>Recent monocular 3D tracking works demonstrate impressive performance, but are limited to either tracking sparse points on the first frame or a slow optimization-based framework for dense tracking.<n>We propose a feedforward model, called Track4World, enabling an efficient holistic 3D tracking of every pixel in the world-centric coordinate system.
arXiv Detail & Related papers (2026-03-03T03:45:43Z)
RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection [22.546559563539272]
We propose RaGS, a framework that leverages 3D Gaussian Splatting to fuse 4D radar and monocular cues for 3D object detection.<n>RaGS achieves object-centric precision and comprehensive scene perception.
arXiv Detail & Related papers (2025-07-26T08:17:12Z)
Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps [13.325879149065008]
3D Gaussian Splatting (3DGS) has become a popular solution in SLAM due to its high-fidelity synthesis and real-time novel view performance.<n>Previous 3DGS SLAM methods employ a differentiable rendering pipeline for tracking, lack geometric priors in outdoor scenes.<n>We propose a robust RGB-only outdoor 3DGS SLAM method: S3PO-GS. Technically, we establish a self-consistent tracking module anchored in the 3DGS pointmap, which avoids cumulative scale drift and achieves more precise and robust tracking with fewer iterations.
arXiv Detail & Related papers (2025-07-04T17:56:43Z)
H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting [39.2960379257236]
Dynamic scene reconstruction poses a persistent challenge in 3D vision.<n>Deformable 3D Gaussian splatting has emerged as an effective method for this task, offering real-time rendering and high visual fidelity.<n>This approach decomposes a dynamic scene into a static representation in a canonical space and time-varying scene motion.<n>Experiments on the Neu3DV and CMU-Panoptic datasets demonstrate that our method achieves superior performance over state-of-the-art deformable 3D Gaussian splatting techniques.
arXiv Detail & Related papers (2024-08-23T12:51:49Z)
$\ extit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving [82.82048452755394]
Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving. Most existing street 3DGS methods require tracked 3D vehicle bounding boxes to decompose the static and dynamic elements. We propose a self-supervised street Gaussian ($textitS3$Gaussian) method to decompose dynamic and static elements from 4D consistency.
arXiv Detail & Related papers (2024-05-30T17:57:08Z)
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction [70.65250036489128]
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene. We propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians. GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.
arXiv Detail & Related papers (2024-05-27T17:59:51Z)
LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field [13.815932949774858]
Cinemagraph is a form of visual media that combines elements of still photography and subtle motion to create a captivating experience. We propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation.
arXiv Detail & Related papers (2024-04-13T11:07:53Z)
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z)
Oriented-grid Encoder for 3D Implicit Representations [10.02138130221506]
This paper is the first to exploit 3D characteristics in 3D geometric encoders explicitly. Our method gets state-of-the-art results when compared to the prior techniques.
arXiv Detail & Related papers (2024-02-09T19:28:13Z)
SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition [66.56357905500512]
3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis.<n>We propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS.<n>Our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.
arXiv Detail & Related papers (2024-01-31T14:19:03Z)
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving. We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.