Skinned Motion Retargeting with Residual Perception of Motion Semantics
& Geometry
- URL: http://arxiv.org/abs/2303.08658v1
- Date: Wed, 15 Mar 2023 14:41:26 GMT
- Title: Skinned Motion Retargeting with Residual Perception of Motion Semantics
& Geometry
- Authors: Jiaxu Zhang, Junwu Weng, Di Kang, Fang Zhao, Shaoli Huang, Xuefei Zhe,
Linchao Bao, Ying Shan, Jue Wang and Zhigang Tu
- Abstract summary: A good motion cannot be reached without consideration of source-target differences on both the skeleton and shape geometry levels.
We propose a novel Residual RETargeting network (R2ET) structure, which relies on two neural modification modules.
Experiments on public dataset Mixamo demonstrate that our R2ET achieves the state-of-the-art performance.
- Score: 34.53794943807786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A good motion retargeting cannot be reached without reasonable consideration
of source-target differences on both the skeleton and shape geometry levels. In
this work, we propose a novel Residual RETargeting network (R2ET) structure,
which relies on two neural modification modules, to adjust the source motions
to fit the target skeletons and shapes progressively. In particular, a
skeleton-aware module is introduced to preserve the source motion semantics. A
shape-aware module is designed to perceive the geometries of target characters
to reduce interpenetration and contact-missing. Driven by our explored
distance-based losses that explicitly model the motion semantics and geometry,
these two modules can learn residual motion modifications on the source motion
to generate plausible retargeted motion in a single inference without
post-processing. To balance these two modifications, we further present a
balancing gate to conduct linear interpolation between them. Extensive
experiments on the public dataset Mixamo demonstrate that our R2ET achieves the
state-of-the-art performance, and provides a good balance between the
preservation of motion semantics as well as the attenuation of interpenetration
and contact-missing. Code is available at https://github.com/Kebii/R2ET.
Related papers
- Semantics-aware Motion Retargeting with Vision-Language Models [19.53696208117539]
We present a novel Semantics-aware Motion reTargeting (SMT) method with the advantage of vision-language models to extract and maintain meaningful motion semantics.
We utilize a differentiable module to render 3D motions and the high-level motion semantics are incorporated into the motion process by feeding the vision-language model and aligning the extracted semantic embeddings.
To ensure the preservation of fine-grained motion details and high-level semantics, we adopt two-stage pipeline consisting of skeleton-aware pre-training and fine-tuning with semantics and geometry constraints.
arXiv Detail & Related papers (2023-12-04T15:23:49Z) - Bidirectionally Deformable Motion Modulation For Video-based Human Pose
Transfer [19.5025303182983]
Video-based human pose transfer is a video-to-video generation task that animates a plain source human image based on a series of target human poses.
We propose a novel Deformable Motion Modulation (DMM) that utilizes geometric kernel offset with adaptive weight modulation to simultaneously perform discontinuous feature alignment and style transfer.
arXiv Detail & Related papers (2023-07-15T09:24:45Z) - Correspondence-free online human motion retargeting [1.7008985510992145]
We present a data-driven framework for unsupervised human motion that animates a target subject with the motion of a source subject.
Our method is correspondence-free, requiring neither correspondences between the source and target shapes nor temporal correspondences between different frames of the source motion.
This allows to animate a target shape with arbitrary sequences of humans in motion, possibly captured using 4D acquisition platforms or consumer devices.
arXiv Detail & Related papers (2023-02-01T16:23:21Z) - MoDi: Unconditional Motion Synthesis from Diverse Data [51.676055380546494]
We present MoDi, an unconditional generative model that synthesizes diverse motions.
Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset.
We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered.
arXiv Detail & Related papers (2022-06-16T09:06:25Z) - JNMR: Joint Non-linear Motion Regression for Video Frame Interpolation [47.123769305867775]
Video frame (VFI) aims to generate frames by warping learnable motions from the bidirectional historical references.
We reformulate VFI as a Joint Non-linear Motion Regression (JNMR) strategy to model the complicated motions of inter-frame.
We show that the effectiveness and significant improvement of joint motion regression compared with state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T02:47:29Z) - MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios.
It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z) - Unsupervised Motion Representation Learning with Capsule Autoencoders [54.81628825371412]
Motion Capsule Autoencoder (MCAE) models motion in a two-level hierarchy.
MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets.
arXiv Detail & Related papers (2021-10-01T16:52:03Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.