SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization
- URL: http://arxiv.org/abs/2602.04271v1
- Date: Wed, 04 Feb 2026 07:00:44 GMT
- Title: SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization
- Authors: Lifan Wu, Ruijie Zhu, Yubo Ai, Tianzhu Zhang,
- Abstract summary: SkeletonGaussian is a framework for generating editable dynamic 3D Gaussians from monocular video input.<n>Our approach decomposes motion into sparse rigid motion explicitly driven by a skeleton and fine-grained non-rigid motion.
- Score: 25.299253655274594
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation field, which limits direct control and editability. To address this issue, we propose SkeletonGaussian, a novel framework for generating editable dynamic 3D Gaussians from monocular video input. Our approach introduces a hierarchical articulated representation that decomposes motion into sparse rigid motion explicitly driven by a skeleton and fine-grained non-rigid motion. Concretely, we extract a robust skeleton and drive rigid motion via linear blend skinning, followed by a hexplane-based refinement for non-rigid deformations, enhancing interpretability and editability. Experimental results demonstrate that SkeletonGaussian surpasses existing methods in generation quality while enabling intuitive motion editing, establishing a new paradigm for editable 4D generation. Project page: https://wusar.github.io/projects/skeletongaussian/
Related papers
- Enhancing non-Rigid 3D Model Deformations Using Mesh-based Gaussian Splatting [0.0]
We propose a novel framework that enhances non-rigid 3D model deformations by bridging mesh representations with 3D Gaussian splatting.<n>This work paves the way for more flexible 3D content-creation in applications spanning virtual reality, character animation, and interactive design.
arXiv Detail & Related papers (2025-07-09T16:26:04Z) - RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos [50.37136267234771]
RigGS is a new paradigm that leverages 3D Gaussian representation and skeleton-based motion representation to model dynamic objects.<n>Our method can generate realistic new actions easily for objects and achieve high-quality rendering.
arXiv Detail & Related papers (2025-03-21T03:27:07Z) - MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting [56.785233997533794]
We propose a novel deformable 3D Gaussian splatting framework called MotionGS.
MotionGS explores explicit motion priors to guide the deformation of 3D Gaussians.
Experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods.
arXiv Detail & Related papers (2024-10-10T08:19:47Z) - SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer [57.506654943449796]
We propose an efficient, sparse-controlled video-to-4D framework named SC4D that decouples motion and appearance.
Our method surpasses existing methods in both quality and efficiency.
We devise a novel application that seamlessly transfers motion onto a diverse array of 4D entities.
arXiv Detail & Related papers (2024-04-04T18:05:18Z) - TC4D: Trajectory-Conditioned Text-to-4D Generation [94.90700997568158]
We propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components.
We learn local deformations that conform to the global trajectory using supervision from a text-to-video model.
Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion.
arXiv Detail & Related papers (2024-03-26T17:55:11Z) - MoDA: Modeling Deformable 3D Objects from Casual Videos [84.29654142118018]
We propose neural dual quaternion blend skinning (NeuDBS) to achieve 3D point deformation without skin-collapsing artifacts.
In the endeavor to register 2D pixels across different frames, we establish a correspondence between canonical feature embeddings that encodes 3D points within the canonical space.
Our approach can reconstruct 3D models for humans and animals with better qualitative and quantitative performance than state-of-the-art methods.
arXiv Detail & Related papers (2023-04-17T13:49:04Z) - Animatable Implicit Neural Representations for Creating Realistic
Avatars from Videos [63.16888987770885]
This paper addresses the challenge of reconstructing an animatable human model from a multi-view video.
We introduce a pose-driven deformation field based on the linear blend skinning algorithm.
We show that our approach significantly outperforms recent human modeling methods.
arXiv Detail & Related papers (2022-03-15T17:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.