Related papers: Make Pixels Dance: High-Dynamic Video Generation

Make Pixels Dance: High-Dynamic Video Generation

URL: http://arxiv.org/abs/2311.10982v1
Date: Sat, 18 Nov 2023 06:25:58 GMT
Title: Make Pixels Dance: High-Dynamic Video Generation
Authors: Yan Zeng, Guoqiang Wei, Jiani Zheng, Jiaxin Zou, Yang Wei, Yuchen Zhang, Hang Li
Abstract summary: State-of-the-art video generation methods tend to produce video clips with minimal motions despite maintaining high fidelity. We introduce PixelDance, a novel approach that incorporates image instructions for both the first and last frames in conjunction with text instructions for video generation.
Score: 13.944607760918997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Creating high-dynamic videos such as motion-rich actions and sophisticated visual effects poses a significant challenge in the field of artificial intelligence. Unfortunately, current state-of-the-art video generation methods, primarily focusing on text-to-video generation, tend to produce video clips with minimal motions despite maintaining high fidelity. We argue that relying solely on text instructions is insufficient and suboptimal for video generation. In this paper, we introduce PixelDance, a novel approach based on diffusion models that incorporates image instructions for both the first and last frames in conjunction with text instructions for video generation. Comprehensive experimental results demonstrate that PixelDance trained with public data exhibits significantly better proficiency in synthesizing videos with complex scenes and intricate motions, setting a new standard for video generation.

Related papers

Every Image Listens, Every Image Dances: Music-Driven Image Animation [8.085267959520843]
MuseDance is an end-to-end model that animates reference images using both music and text inputs. Unlike existing approaches, MuseDance eliminates the need for complex motion guidance inputs, such as pose or depth sequences. We present a new multimodal dataset comprising 2,904 dance videos with corresponding background music and text descriptions.
arXiv Detail & Related papers (2025-01-30T23:38:51Z)
Video Creation by Demonstration [59.389591010842636]
We present $delta$-Diffusion, a self-supervised training approach that learns from unlabeled videos by conditional future frame prediction. By leveraging a video foundation model with an appearance bottleneck design on top, we extract action latents from demonstration videos for conditioning the generation process. Empirically, $delta$-Diffusion outperforms related baselines in terms of both human preference and large-scale machine evaluations.
arXiv Detail & Related papers (2024-12-12T18:41:20Z)
Fleximo: Towards Flexible Text-to-Human Motion Video Generation [17.579663311741072]
We introduce a novel task aimed at generating human motion videos solely from reference images and natural language. We propose a new framework called Fleximo, which leverages large-scale pre-trained text-to-3D motion models. To assess the performance of Fleximo, we introduce a new benchmark called MotionBench, which includes 400 videos across 20 identities and 20 motions.
arXiv Detail & Related papers (2024-11-29T04:09:13Z)
I4VGen: Image as Free Stepping Stone for Text-to-Video Generation [28.910648256877113]
We present I4VGen, a novel video diffusion inference pipeline to enhance pre-trained text-to-video diffusion models. I4VGen consists of two stages: anchor image synthesis and anchor image-augmented text-to-video synthesis. Experiments show that the proposed method produces videos with higher visual realism and textual fidelity datasets.
arXiv Detail & Related papers (2024-06-04T11:48:44Z)
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video. In this paper, we address such limitations in video pre-training with an efficient video decomposition. Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z)
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos. The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance. Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z)
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance [36.26032505627126]
Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. In this paper, we explore customized video generation by utilizing text as context description and motion structure. Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model.
arXiv Detail & Related papers (2023-06-01T17:43:27Z)
Text2Performer: Text-Driven Human Video Generation [97.3849869893433]
Text-driven content creation has evolved to be a transformative technique that revolutionizes creativity. Here we study the task of text-driven human video generation, where a video sequence is synthesized from texts describing the appearance and motions of a target performer. In this work, we present Text2Performer to generate vivid human videos with articulated motions from texts.
arXiv Detail & Related papers (2023-04-17T17:59:02Z)
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators [70.17041424896507]
Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets. We propose a new task of zero-shot text-to-video generation using existing text-to-image synthesis methods. Our method performs comparably or sometimes better than recent approaches, despite not being trained on additional video data.
arXiv Detail & Related papers (2023-03-23T17:01:59Z)
Dreamix: Video Diffusion Models are General Video Editors [22.127604561922897]
Text-driven image and video diffusion models have recently achieved unprecedented generation realism. We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos.
arXiv Detail & Related papers (2023-02-02T18:58:58Z)
Render In-between: Motion Guided Video Synthesis for Action Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z)
Video Generation from Text Employing Latent Path Construction for Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study. In this paper, we tackle the text to video generation problem, which is a conditional form of video generation. We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z)
TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary Generator [34.7504057664375]
We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial Network (TiVGAN), which evolves frame-by-frame and finally produces a full-length video. Step-by-step learning process helps stabilize the training and enables the creation of high-resolution video based on conditional text descriptions.
arXiv Detail & Related papers (2020-09-04T06:33:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.