IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes
- URL: http://arxiv.org/abs/2511.19235v1
- Date: Mon, 24 Nov 2025 15:48:08 GMT
- Title: IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes
- Authors: Carl Lindström, Mahan Rafidashti, Maryam Fatemi, Lars Hammarstrand, Martin R. Oswald, Lennart Svensson,
- Abstract summary: Reconstructing dynamic driving scenes is essential for developing autonomous systems through sensor-realistic simulation.<n>We present IDSplat, a self-supervised 3D Gaussian Splatting framework that reconstructs dynamic scenes with explicit instance decomposition and learnable motion trajectories.<n>Our method achieves competitive reconstruction quality while maintaining instance-level decomposition and generalizes across diverse sequences and view densities without retraining.
- Score: 25.939318593012484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing dynamic driving scenes is essential for developing autonomous systems through sensor-realistic simulation. Although recent methods achieve high-fidelity reconstructions, they either rely on costly human annotations for object trajectories or use time-varying representations without explicit object-level decomposition, leading to intertwined static and dynamic elements that hinder scene separation. We present IDSplat, a self-supervised 3D Gaussian Splatting framework that reconstructs dynamic scenes with explicit instance decomposition and learnable motion trajectories, without requiring human annotations. Our key insight is to model dynamic objects as coherent instances undergoing rigid transformations, rather than unstructured time-varying primitives. For instance decomposition, we employ zero-shot, language-grounded video tracking anchored to 3D using lidar, and estimate consistent poses via feature correspondences. We introduce a coordinated-turn smoothing scheme to obtain temporally and physically consistent motion trajectories, mitigating pose misalignments and tracking failures, followed by joint optimization of object poses and Gaussian parameters. Experiments on the Waymo Open Dataset demonstrate that our method achieves competitive reconstruction quality while maintaining instance-level decomposition and generalizes across diverse sequences and view densities without retraining, making it practical for large-scale autonomous driving applications. Code will be released.
Related papers
- SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting [19.12278036176021]
We present SV-GS, a framework that simultaneously estimates a deformation model and the object's motion over time under sparse observations.<n>Our method outperforms existing approaches under sparse observations by up to 34% in PSNR.
arXiv Detail & Related papers (2026-01-01T09:53:03Z) - DIAL-GS: Dynamic Instance Aware Reconstruction for Label-free Street Scenes with 4D Gaussian Splatting [23.017838856573917]
We propose DIAL-GS, a novel dynamic instance-aware reconstruction method for label-free street scenes.<n>We first accurately identify dynamic instances by exploiting appearance-position inconsistency between warped rendering and actual observation.<n>We introduce a reciprocal mechanism through which identity and dynamics reinforce each other, enhancing both integrity and consistency.
arXiv Detail & Related papers (2025-11-10T02:18:40Z) - BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting [3.376357029373187]
B'ezier curve splatting (B'ezierGS) represents the motion trajectories of dynamic objects using learnable B'ezier curves.<n>B'ezierGS outperforms state-of-the-art alternatives in both dynamic and static scene components reconstruction and novel view synthesis.
arXiv Detail & Related papers (2025-06-27T10:30:16Z) - UnIRe: Unsupervised Instance Decomposition for Dynamic Urban Scene Reconstruction [36.00679909382783]
We propose UnIRe, a 3D Splatting (3DGS) based approach that decomposes a scene into a static background and individual dynamic instances.<n>At its core, we introduce 4D superpoints, a novel representation that clusters multi-frame LiDAR points in 4D space.<n>Experiments show that our method outperforms existing methods in dynamic scene reconstruction while enabling accurate and flexible instance-level editing.
arXiv Detail & Related papers (2025-04-01T13:15:58Z) - UrbanGS: Semantic-Guided Gaussian Splatting for Urban Scene Reconstruction [86.4386398262018]
UrbanGS uses 2D semantic maps and an existing dynamic Gaussian approach to distinguish static objects from the scene.<n>For potentially dynamic objects, we aggregate temporal information using learnable time embeddings.<n>Our approach outperforms state-of-the-art methods in reconstruction quality and efficiency.
arXiv Detail & Related papers (2024-12-04T16:59:49Z) - T-3DGS: Removing Transient Objects for 3D Scene Reconstruction [83.05271859398779]
Transient objects in video sequences can significantly degrade the quality of 3D scene reconstructions.<n>We propose T-3DGS, a novel framework that robustly filters out transient distractors during 3D reconstruction using Gaussian Splatting.
arXiv Detail & Related papers (2024-11-29T07:45:24Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.<n>By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.<n>We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.<n>voxelization infers per-object occupancy probabilities at individual spatial locations.<n>Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - SMORE: Simultaneous Map and Object REconstruction [66.66729715211642]
We present a method for dynamic surface reconstruction of large-scale urban scenes from LiDAR.<n>We take a holistic perspective and optimize a compositional model of a dynamic scene that decomposes the world into rigidly-moving objects and the background.
arXiv Detail & Related papers (2024-06-19T23:53:31Z) - DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes [57.12439406121721]
We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes.
For complex scenes with moving objects, we first sequentially and progressively model the static background of the entire scene.
We then leverage a composite dynamic Gaussian graph to handle multiple moving objects.
We further use a LiDAR prior for Gaussian Splatting to reconstruct scenes with greater details and maintain panoramic consistency.
arXiv Detail & Related papers (2023-12-13T06:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.