FuguReport

Deformable Gaussian Occupancy: Decoupling Rigid and Nonrigid Motion with Factorized Distillation

Authors Yang Gao, Wuyang Li, Po-Chien Luan, Alexandre Alahi
Affiliations École Polytechnique Fédérale de Lausanne
Categories Method / 3D Perception / Deformable occupancy for motion modeling, Evaluation / Benchmarking / Occ3D-NuScenes weakly supervised performance, Application / Autonomous Driving Perception / 3D occupancy motion estimation
License CC BY 4.0

Abstract Overview

This paper studies weakly supervised 3D occupancy prediction for dynamic driving scenes, with particular attention to human-centric nonrigid motion that is not well handled by prior rigid-motion Gaussian methods. The proposed DeGO framework represents scenes with deformable Gaussian primitives whose motion is decoupled into rigid offsets and nonrigid deformation through a learnable rigidity mask. It also introduces factorized feature distillation from the VGGT foundation model to transfer cross-camera and cross-frame information into the Gaussian representation. On the Occ3D-NuScenes benchmark, the method improves temporal consistency and dynamic scene understanding while maintaining single-frame feed-forward inference at test time.

Novelty

The main novelty is the combination of decoupled Gaussian deformation and factorized 4D foundation-model distillation within a weakly supervised occupancy framework. A distinctive aspect is the per-Gaussian rigidity mask, which lets the model treat rigid structures and nonrigid agents differently instead of relying only on simple temporal offsets.

Results

On Occ3D-NuScenes, DeGO achieves state-of-the-art weakly supervised results with 45.38 IoU, 18.05 mIoU, 10.34 instance mIoU, 33.46 scene mIoU, and 11.04 human-centric mIoU. The model demonstrates gains of 10.9% in mIoU overall and 13.5% on the human-centric metric relative to the best prior methods, with ablations confirming that both deformation modeling and VGGT-based distillation contribute significantly to performance.

Key Points

  1. DeGO decouples rigid and nonrigid motion by combining offset updates with deformation updates under a learnable per-Gaussian rigidity mask.
  2. The method distills spatial and temporal features from the VGGT foundation model, using cross-camera and cross-frame guidance to improve feature alignment.
  3. Experiments on Occ3D-NuScenes show state-of-the-art weakly supervised occupancy performance, particularly improving the representation of human-centric dynamic classes.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.