MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video
- URL: http://arxiv.org/abs/2405.12806v3
- Date: Sat, 22 Jun 2024 02:00:25 GMT
- Title: MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video
- Authors: Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu,
- Abstract summary: Single-view clothed human reconstruction holds a central position in virtual reality applications.
Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion.
We introduce Motion-Based 3D Clothed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface.
- Score: 14.683309699648175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clo}thed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/.
Related papers
- Expressive Gaussian Human Avatars from Monocular RGB Video [69.56388194249942]
We introduce EVA, a drivable human model that meticulously sculpts fine details based on 3D Gaussians and SMPL-X.
We highlight the critical importance of aligning the SMPL-X model with RGB frames for effective avatar learning.
We propose a context-aware adaptive density control strategy, which is adaptively adjusting the gradient thresholds.
arXiv Detail & Related papers (2024-07-03T15:36:27Z) - Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs [15.017274891943162]
Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision.
Inertial sensor has been introduced to provide complementary source of information.
It remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses.
arXiv Detail & Related papers (2024-04-27T09:02:42Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering [41.589093951039814]
We integrate physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS.
Our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views.
arXiv Detail & Related papers (2024-01-27T06:45:22Z) - MACS: Mass Conditioned 3D Hand and Object Motion Synthesis [68.40728343078257]
The physical properties of an object, such as mass, significantly affect how we manipulate it with our hands.
This work proposes MACS the first MAss Conditioned 3D hand and object motion Synthesis approach.
Our approach is based on cascaded diffusion models and generates interactions that plausibly adjust based on the object mass and interaction type.
arXiv Detail & Related papers (2023-12-22T18:59:54Z) - GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction [61.833152949826946]
We propose a novel Ground-aware Motion Model for 3D Human Motion Reconstruction, named GraMMaR.
GraMMaR learns the distribution of transitions in both pose and interaction between every joint and ground plane at each time step of a motion sequence.
It is trained to explicitly promote consistency between the motion and distance change towards the ground.
arXiv Detail & Related papers (2023-06-29T07:22:20Z) - MoDA: Modeling Deformable 3D Objects from Casual Videos [84.29654142118018]
We propose neural dual quaternion blend skinning (NeuDBS) to achieve 3D point deformation without skin-collapsing artifacts.
In the endeavor to register 2D pixels across different frames, we establish a correspondence between canonical feature embeddings that encodes 3D points within the canonical space.
Our approach can reconstruct 3D models for humans and animals with better qualitative and quantitative performance than state-of-the-art methods.
arXiv Detail & Related papers (2023-04-17T13:49:04Z) - MotionBERT: A Unified Perspective on Learning Human Motion
Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations.
We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z) - SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local
Elements [62.652588951757764]
Learning to model and reconstruct humans in clothing is challenging due to articulation, non-rigid deformation, and varying clothing types and topologies.
Recent work uses neural networks to parameterize local surface elements.
We present three key innovations: First, we deform surface elements based on a human body model.
Second, we address the limitations of existing neural surface elements by regressing local geometry from local features.
arXiv Detail & Related papers (2021-04-15T17:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.