Related papers: Motion and Appearance Adaptation for Cross-Domain Motion Transfer

Motion and Appearance Adaptation for Cross-Domain Motion Transfer

URL: http://arxiv.org/abs/2209.14529v1
Date: Thu, 29 Sep 2022 03:24:47 GMT
Title: Motion and Appearance Adaptation for Cross-Domain Motion Transfer
Authors: Borun Xu, Biao Wang, Jinhong Deng, Jiale Tao, Tiezheng Ge, Yuning Jiang, Wen Li, Lixin Duan
Abstract summary: Motion transfer aims to transfer the motion of a driving video to a source image. Traditional single domain motion transfer approaches often produce notable artifacts. We propose a Motion and Appearance Adaptation (MAA) approach for cross-domain motion transfer.
Score: 36.98500700394921
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motion transfer aims to transfer the motion of a driving video to a source image. When there are considerable differences between object in the driving video and that in the source image, traditional single domain motion transfer approaches often produce notable artifacts; for example, the synthesized image may fail to preserve the human shape of the source image (cf . Fig. 1 (a)). To address this issue, in this work, we propose a Motion and Appearance Adaptation (MAA) approach for cross-domain motion transfer, in which we regularize the object in the synthesized image to capture the motion of the object in the driving frame, while still preserving the shape and appearance of the object in the source image. On one hand, considering the object shapes of the synthesized image and the driving frame might be different, we design a shape-invariant motion adaptation module that enforces the consistency of the angles of object parts in two images to capture the motion information. On the other hand, we introduce a structure-guided appearance consistency module designed to regularize the similarity between the corresponding patches of the synthesized image and the source image without affecting the learned motion in the synthesized image. Our proposed MAA model can be trained in an end-to-end manner with a cyclic reconstruction loss, and ultimately produces a satisfactory motion transfer result (cf . Fig. 1 (b)). We conduct extensive experiments on human dancing dataset Mixamo-Video to Fashion-Video and human face dataset Vox-Celeb to Cufs; on both of these, our MAA model outperforms existing methods both quantitatively and qualitatively.

Related papers

In-2-4D: Inbetweening from Two Single-View Images to 4D Generation [54.62824686338408]
We propose a new problem, In-between2-4D, for generative 4D (i.e., 3D + motion) in Splating from a minimalistic input setting. Given two images representing the start and end states of an object in motion, our goal is to generate and reconstruct the motion in 4D.
arXiv Detail & Related papers (2025-04-11T09:01:09Z)
Instance-Level Moving Object Segmentation from a Single Image with Events [84.12761042512452]
Moving object segmentation plays a crucial role in understanding dynamic scenes involving multiple moving objects. Previous methods encounter difficulties in distinguishing whether pixel displacements of an object are caused by camera motion or object motion. Recent advances exploit the motion sensitivity of novel event cameras to counter conventional images' inadequate motion modeling capabilities. We propose the first instance-level moving object segmentation framework that integrates complementary texture and motion cues.
arXiv Detail & Related papers (2025-02-18T15:56:46Z)
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators. VideoJAM achieves state-of-the-art performance in motion coherence. These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z)
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches [12.221087476416056]
We introduce "motion patches", a new representation of motion sequences, and propose using Vision Transformers (ViT) as motion encoders via transfer learning. These motion patches, created by dividing and sorting skeleton joints based on motion sequences, are robust to varying skeleton structures. We find that transfer learning with pre-trained weights of ViT obtained through training with 2D image data can boost the performance of motion analysis.
arXiv Detail & Related papers (2024-05-08T02:42:27Z)
Continuous Piecewise-Affine Based Motion Model for Image Animation [45.55812811136834]
Image animation aims to bring static images to life according to driving videos. Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image. We propose to model motion from the source image to the driving frame in highly-expressive diffeo spaces.
arXiv Detail & Related papers (2024-01-17T11:40:05Z)
Generative Image Dynamics [80.70729090482575]
We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences.
arXiv Detail & Related papers (2023-09-14T17:54:01Z)
Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer [19.5025303182983]
Video-based human pose transfer is a video-to-video generation task that animates a plain source human image based on a series of target human poses. We propose a novel Deformable Motion Modulation (DMM) that utilizes geometric kernel offset with adaptive weight modulation to simultaneously perform discontinuous feature alignment and style transfer.
arXiv Detail & Related papers (2023-07-15T09:24:45Z)
Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis. We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching. Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z)
REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer [96.64111294772141]
Human Video Motion Transfer (HVMT) aims to, given an image of a source person, generate his/her video that imitates the motion of the driving person. Existing methods for HVMT mainly exploit Generative Adversarial Networks (GANs) to perform the warping operation. This paper presents a novel REgionto-whole human MOtion Transfer framework based on GANs.
arXiv Detail & Related papers (2022-09-01T14:03:51Z)
Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation [41.85199775016731]
We introduce DiME, an end-to-end unsupervised motion transfer framework. By capturing the motion transfer with an ordinary differential equation (ODE), it helps to regularize the motion field. We also propose a natural extension to the ODE idea, which is that DiME can easily leverage multiple different views of the source object whenever they are available.
arXiv Detail & Related papers (2021-10-09T22:44:30Z)
Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis [38.41763708731513]
We propose Dual Motion Transfer GAN (Dual-MTGAN), which takes image and video data as inputs while learning disentangled content and motion representations. Our Dual-MTGAN is able to perform deterministic motion transfer and motion generation. The proposed model is trained in an end-to-end manner, without the need to utilize pre-defined motion features like pose or facial landmarks.
arXiv Detail & Related papers (2021-02-26T06:54:48Z)
Future Video Synthesis with Object Motion Prediction [54.31508711871764]
Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics. The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects. Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.
arXiv Detail & Related papers (2020-04-01T16:09:54Z)
First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.