Every Painting Awakened: A Training-free Framework for Painting-to-Animation Generation
- URL: http://arxiv.org/abs/2503.23736v1
- Date: Mon, 31 Mar 2025 05:25:49 GMT
- Title: Every Painting Awakened: A Training-free Framework for Painting-to-Animation Generation
- Authors: Lingyu Liu, Yaxiong Wang, Li Zhu, Zhedong Zheng,
- Abstract summary: We introduce a training-free framework specifically designed to bring real-world static paintings to life through image-to-video (I2V) synthesis.<n>Existing I2V methods, primarily trained on natural video datasets, often struggle to generate dynamic outputs from static paintings.<n>Our framework enables plug-and-play integration with existing I2V methods, making it an ideal solution for animating real-world paintings.
- Score: 25.834500552609136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a training-free framework specifically designed to bring real-world static paintings to life through image-to-video (I2V) synthesis, addressing the persistent challenge of aligning these motions with textual guidance while preserving fidelity to the original artworks. Existing I2V methods, primarily trained on natural video datasets, often struggle to generate dynamic outputs from static paintings. It remains challenging to generate motion while maintaining visual consistency with real-world paintings. This results in two distinct failure modes: either static outputs due to limited text-based motion interpretation or distorted dynamics caused by inadequate alignment with real-world artistic styles. We leverage the advanced text-image alignment capabilities of pre-trained image models to guide the animation process. Our approach introduces synthetic proxy images through two key innovations: (1) Dual-path score distillation: We employ a dual-path architecture to distill motion priors from both real and synthetic data, preserving static details from the original painting while learning dynamic characteristics from synthetic frames. (2) Hybrid latent fusion: We integrate hybrid features extracted from real paintings and synthetic proxy images via spherical linear interpolation in the latent space, ensuring smooth transitions and enhancing temporal consistency. Experimental evaluations confirm that our approach significantly improves semantic alignment with text prompts while faithfully preserving the unique characteristics and integrity of the original paintings. Crucially, by achieving enhanced dynamic effects without requiring any model training or learnable parameters, our framework enables plug-and-play integration with existing I2V methods, making it an ideal solution for animating real-world paintings. More animated examples can be found on our project website.
Related papers
- Dynamic Typography: Bringing Text to Life via Video Diffusion Prior [73.72522617586593]
We present an automated text animation scheme, termed "Dynamic Typography"
It deforms letters to convey semantic meaning and infuses them with vibrant movements based on user prompts.
Our technique harnesses vector graphics representations and an end-to-end optimization-based framework.
arXiv Detail & Related papers (2024-04-17T17:59:55Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos.
The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance.
Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z) - Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with
Image Diffusion Model [57.855362366674264]
We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues.
Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion.
arXiv Detail & Related papers (2023-08-15T13:00:42Z) - Text-Guided Synthesis of Eulerian Cinemagraphs [81.20353774053768]
We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions.
We focus on cinemagraphs of fluid elements, such as flowing rivers, and drifting clouds, which exhibit continuous motion and repetitive textures.
arXiv Detail & Related papers (2023-07-06T17:59:31Z) - Unsupervised Learning of Style-Aware Facial Animation from Real Acting
Performances [3.95944314850151]
We present a novel approach for text/speech-driven animation of a photo-realistic head model based on blend-shape geometry, dynamic textures, and neural rendering.
Our animation method is based on a conditional CNN that transforms text or speech into a sequence of animation parameters.
For realistic real-time rendering, we train a U-Net that refines pixelization-based renderings by computing improved colors and a foreground matte.
arXiv Detail & Related papers (2023-06-16T17:58:04Z) - Improving the Perceptual Quality of 2D Animation Interpolation [37.04208600867858]
Traditional 2D animation is labor-intensive, often requiring animators to draw twelve illustrations per second of movement.
Lower framerates result in larger displacements and occlusions, discrete perceptual elements (e.g. lines and solid-color regions) pose difficulties for texture-oriented convolutional networks.
Previous work tried addressing these issues, but used unscalable methods and focused on pixel-perfect performance.
We build a scalable system more appropriately centered on perceptual quality for this artistic domain.
arXiv Detail & Related papers (2021-11-24T20:51:29Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.