Related papers: Continuous Piecewise-Affine Based Motion Model for Image Animation

Continuous Piecewise-Affine Based Motion Model for Image Animation

URL: http://arxiv.org/abs/2401.09146v1
Date: Wed, 17 Jan 2024 11:40:05 GMT
Title: Continuous Piecewise-Affine Based Motion Model for Image Animation
Authors: Hexiang Wang, Fengqi Liu, Qianyu Zhou, Ran Yi, Xin Tan, Lizhuang Ma
Abstract summary: Image animation aims to bring static images to life according to driving videos. Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image. We propose to model motion from the source image to the driving frame in highly-expressive diffeo spaces.
Score: 45.55812811136834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image animation aims to bring static images to life according to driving videos and create engaging visual content that can be used for various purposes such as animation, entertainment, and education. Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image. However, limited by the expressive power of the transformations used, these methods always produce poor results when the gap between the motion in the driving frame and the source image is large. To address this issue, we propose to model motion from the source image to the driving frame in highly-expressive diffeomorphism spaces. Firstly, we introduce Continuous Piecewise-Affine based (CPAB) transformation to model the motion and present a well-designed inference algorithm to generate CPAB transformation from control keypoints. Secondly, we propose a SAM-guided keypoint semantic loss to further constrain the keypoint extraction process and improve the semantic consistency between the corresponding keypoints on the source and driving images. Finally, we design a structure alignment loss to align the structure-related features extracted from driving and generated images, thus helping the generator generate results that are more consistent with the driving action. Extensive experiments on four datasets demonstrate the effectiveness of our method against state-of-the-art competitors quantitatively and qualitatively. Code will be publicly available at: https://github.com/DevilPG/AAAI2024-CPABMM.

Related papers

Motion-Aware Concept Alignment for Consistent Video Editing [57.08108545219043]
We introduce MoCA-Video (Motion-Aware Concept Alignment in Video), a training-free framework bridging the gap between image-domain semantic mixing and video.<n>Given a generated video and a user-provided reference image, MoCA-Video injects the semantic features of the reference image into a specific object within the video.<n>We evaluate MoCA's performance using the standard SSIM, image-level LPIPS, temporal LPIPS, and introduce a novel metric CASS (Conceptual Alignment Shift Score) to evaluate the consistency and effectiveness of the visual shifts between the source prompt and the modified video frames
arXiv Detail & Related papers (2025-06-01T13:28:04Z)
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation [26.888320234592978]
Zero-shot, training-free, image-based text-to-video generation is an emerging area that aims to generate videos using existing image-based diffusion models. We provide a model-agnostic approach, using intersections in diffusion trajectories, working only with latent values. An in-context trained LLM is used to generate coherent frame-wise prompts; another is used to identify differences between frames. Our approach results in state-of-the-art performance while being more flexible when working with diverse image-generation models.
arXiv Detail & Related papers (2025-04-09T13:11:09Z)
Framer: Interactive Frame Interpolation [73.06734414930227]
Framer targets producing smoothly transitioning frames between two images as per user creativity. Our approach supports customizing the transition process by tailoring the trajectory of some selected keypoints. It is noteworthy that our system also offers an "autopilot" mode, where we introduce a module to estimate the keypoints and the trajectory automatically.
arXiv Detail & Related papers (2024-10-24T17:59:51Z)
Thin-Plate Spline-based Interpolation for Animation Line Inbetweening [54.69811179222127]
Chamfer Distance (CD) is commonly adopted for evaluating inbetweening performance. We propose a simple yet effective method for animation line inbetweening that adopts thin-plate spline-based transformation. Our method outperforms existing approaches by delivering high-quality results with enhanced fluidity.
arXiv Detail & Related papers (2024-08-17T08:05:31Z)
Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner [28.939227214483953]
This paper employs a regression-based network to learn the variation patterns of StyleGAN latent codes during the image dragging process. We show that our method achieves state-of-the-art (SOTA) inference speed and image editing performance at the pixel-level granularity.
arXiv Detail & Related papers (2024-07-26T10:45:57Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization. This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z)
Motion Transformer for Unsupervised Image Animation [37.35527776043379]
Image animation aims to animate a source image by using motion learned from a driving video. Current state-of-the-art methods typically use convolutional neural networks (CNNs) to predict motion information. We propose a new method, the motion transformer, which is the first attempt to build a motion estimator based on a vision transformer.
arXiv Detail & Related papers (2022-09-28T12:04:58Z)
A Constrained Deformable Convolutional Network for Efficient Single Image Dynamic Scene Blind Deblurring with Spatially-Variant Motion Blur Kernels Estimation [12.744989551644744]
We propose a novel constrained deformable convolutional network (CDCN) for efficient single image dynamic scene blind deblurring. CDCN simultaneously achieves accurate spatially-variant motion blur kernels estimation and the high-quality image restoration.
arXiv Detail & Related papers (2022-08-23T03:28:21Z)
DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z)
Thin-Plate Spline Motion Model for Image Animation [9.591298403129532]
Image animation brings life to the static object in the source image according to the driving video. Recent works attempt to perform motion transfer on arbitrary objects through unsupervised methods without using a priori knowledge. It remains a significant challenge for current unsupervised methods when there is a large pose gap between the objects in the source and driving images.
arXiv Detail & Related papers (2022-03-27T18:40:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.