Related papers: MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks

MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks

URL: http://arxiv.org/abs/2112.10082v2
Date: Tue, 21 Dec 2021 09:16:15 GMT
Title: MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks
Authors: Wentao Zhu, Zhuoqian Yang, Ziang Di, Wayne Wu, Yizhou Wang, Chen Change Loy
Abstract summary: We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios. It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
Score: 77.56526918859345
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We present a novel framework that brings the 3D motion retargeting task from controlled environments to in-the-wild scenarios. In particular, our method is capable of retargeting body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure. It is designed to leverage massive online videos for unsupervised training, needless of 3D annotations or motion-body pairing information. The proposed method is built upon two novel canonicalization operations, structure canonicalization and view canonicalization. Trained with the canonicalization operations and the derived regularizations, our method learns to factorize a skeleton sequence into three independent semantic subspaces, i.e., motion, structure, and view angle. The disentangled representation enables motion retargeting from 2D to 3D with high precision. Our method achieves superior performance on motion transfer benchmarks with large body variations and challenging actions. Notably, the canonicalized skeleton sequence could serve as a disentangled and interpretable representation of human motion that benefits action analysis and motion retrieval.

Related papers

Recovering Dynamic 3D Sketches from Videos [30.87733869892925]
Liv3Stroke is a novel approach for abstracting objects in motion with deformable 3D strokes. We first extract noisy, 3D point cloud motion guidance from video frames using semantic features. Our approach deforms a set of curves to abstract essential motion features as a set of explicit 3D representations.
arXiv Detail & Related papers (2025-03-26T08:43:21Z)
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning. voxelization infers per-object occupancy probabilities at individual spatial locations. Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z)
Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion. We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases. Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z)
Generating Continual Human Motion in Diverse 3D Scenes [56.70255926954609]
We introduce a method to synthesize animator guided human motion across 3D scenes. We decompose the continual motion synthesis problem into walking along paths and transitioning in and out of the actions specified by the keypoints. Our model can generate long sequences of diverse actions such as grabbing, sitting and leaning chained together.
arXiv Detail & Related papers (2023-04-04T18:24:22Z)
MotionBERT: A Unified Perspective on Learning Human Motion Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources. We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations. We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z)
Action2video: Generating Videos of Human 3D Actions [31.665831044217363]
We aim to tackle the interesting yet challenging problem of generating videos of diverse and natural human motions from prescribed action categories. Key issue lies in the ability to synthesize multiple distinct motion sequences that are realistic in their visual appearances. Action2motionally generates plausible 3D pose sequences of a prescribed action category, which are processed and rendered by motion2video to form 2D videos.
arXiv Detail & Related papers (2021-11-12T20:20:37Z)
Neural Monocular 3D Human Motion Capture with Physical Awareness [76.55971509794598]
We present a new trainable system for physically plausible markerless 3D human motion capture. Unlike most neural methods for human motion capture, our approach is aware of physical and environmental constraints. It produces smooth and physically principled 3D motions in an interactive frame rate in a wide variety of challenging scenes.
arXiv Detail & Related papers (2021-05-03T17:57:07Z)
Learning monocular 3D reconstruction of articulated categories from motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss. We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles. We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z)
Kinematic 3D Object Detection in Monocular Video [123.7119180923524]
We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization. We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
arXiv Detail & Related papers (2020-07-19T01:15:12Z)
Motion Guided 3D Pose Estimation from Videos [81.14443206968444]
We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose. In computing motion loss, a simple yet effective representation for keypoint motion, called pairwise motion encoding, is introduced. We design a new graph convolutional network architecture, U-shaped GCN (UGCN), which captures both short-term and long-term motion information.
arXiv Detail & Related papers (2020-04-29T06:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.