Deformable Gaussian Occupancy: Decoupling Rigid and Nonrigid Motion with Factorized Distillation
Abstract Overview
This paper studies weakly supervised 3D occupancy prediction for dynamic driving scenes, with particular attention to human-centric nonrigid motion that is not well handled by prior rigid-motion Gaussian methods. The proposed DeGO framework represents scenes with deformable Gaussian primitives whose motion is decoupled into rigid offsets and nonrigid deformation through a learnable rigidity mask. It also introduces factorized feature distillation from the VGGT foundation model to transfer cross-camera and cross-frame information into the Gaussian representation. On the Occ3D-NuScenes benchmark, the method improves temporal consistency and dynamic scene understanding while maintaining single-frame feed-forward inference at test time.
Novelty
The main novelty is the combination of decoupled Gaussian deformation and factorized 4D foundation-model distillation within a weakly supervised occupancy framework. A distinctive aspect is the per-Gaussian rigidity mask, which lets the model treat rigid structures and nonrigid agents differently instead of relying only on simple temporal offsets.
Results
On Occ3D-NuScenes, DeGO achieves state-of-the-art weakly supervised results with 45.38 IoU, 18.05 mIoU, 10.34 instance mIoU, 33.46 scene mIoU, and 11.04 human-centric mIoU. The model demonstrates gains of 10.9% in mIoU overall and 13.5% on the human-centric metric relative to the best prior methods, with ablations confirming that both deformation modeling and VGGT-based distillation contribute significantly to performance.
Key Points
- DeGO decouples rigid and nonrigid motion by combining offset updates with deformation updates under a learnable per-Gaussian rigidity mask.
- The method distills spatial and temporal features from the VGGT foundation model, using cross-camera and cross-frame guidance to improve feature alignment.
- Experiments on Occ3D-NuScenes show state-of-the-art weakly supervised occupancy performance, particularly improving the representation of human-centric dynamic classes.