MovieFactory: Automatic Movie Creation from Text using Large Generative
Models for Language and Images
- URL: http://arxiv.org/abs/2306.07257v1
- Date: Mon, 12 Jun 2023 17:31:23 GMT
- Title: MovieFactory: Automatic Movie Creation from Text using Large Generative
Models for Language and Images
- Authors: Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang
Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu
- Abstract summary: We present MovieFactory, a framework to generate cinematic-picture (3072$times$1280), film-style (multi-scene), and multi-modality (sounding) movies.
Our approach empowers users to create captivating movies with smooth transitions using simple text inputs.
- Score: 92.13079696503803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present MovieFactory, a powerful framework to generate
cinematic-picture (3072$\times$1280), film-style (multi-scene), and
multi-modality (sounding) movies on the demand of natural languages. As the
first fully automated movie generation model to the best of our knowledge, our
approach empowers users to create captivating movies with smooth transitions
using simple text inputs, surpassing existing methods that produce soundless
videos limited to a single scene of modest quality. To facilitate this
distinctive functionality, we leverage ChatGPT to expand user-provided text
into detailed sequential scripts for movie generation. Then we bring scripts to
life visually and acoustically through vision generation and audio retrieval.
To generate videos, we extend the capabilities of a pretrained text-to-image
diffusion model through a two-stage process. Firstly, we employ spatial
finetuning to bridge the gap between the pretrained image model and the new
video dataset. Subsequently, we introduce temporal learning to capture object
motion. In terms of audio, we leverage sophisticated retrieval models to select
and align audio elements that correspond to the plot and visual content of the
movie. Extensive experiments demonstrate that our MovieFactory produces movies
with realistic visuals, diverse scenes, and seamlessly fitting audio, offering
users a novel and immersive experience. Generated samples can be found in
YouTube or Bilibili (1080P).
Related papers
- Read, Watch and Scream! Sound Generation from Text and Video [23.990569918960315]
We propose a novel video-and-text-to-sound generation method called ReWaS.
Our method estimates the structural information of audio from the video while receiving key content cues from a user prompt.
By separating the generative components of audio, it becomes a more flexible system that allows users to freely adjust the energy, surrounding environment, and primary sound source according to their preferences.
arXiv Detail & Related papers (2024-07-08T01:59:17Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - VideoStudio: Generating Consistent-Content and Multi-Scene Videos [88.88118783892779]
VideoStudio is a framework for consistent-content and multi-scene video generation.
VideoStudio leverages Large Language Models (LLM) to convert the input prompt into comprehensive multi-scene script.
VideoStudio outperforms the SOTA video generation models in terms of visual quality, content consistency, and user preference.
arXiv Detail & Related papers (2024-01-02T15:56:48Z) - WavJourney: Compositional Audio Creation with Large Language Models [38.39551216587242]
We present WavJourney, a novel framework that leverages Large Language Models to connect various audio models for audio creation.
WavJourney allows users to create storytelling audio content with diverse audio elements simply from textual descriptions.
We show that WavJourney is capable of synthesizing realistic audio aligned with textually-described semantic, spatial and temporal conditions.
arXiv Detail & Related papers (2023-07-26T17:54:04Z) - Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation [69.20173154096]
We develop a framework comprised of two functional modules, Motion Structure Retrieval and Structure-Guided Text-to-Video Synthesis.
For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.
For the second module, we propose a controllable video generation model that offers flexible controls over structure and characters.
arXiv Detail & Related papers (2023-07-13T17:57:13Z) - CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained
Language-Vision Models [50.42886595228255]
We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge.
We train a conditional diffusion model to generate the audio track of a video, given a video frame encoded by a pretrained contrastive language-image pretraining model.
arXiv Detail & Related papers (2023-06-16T05:42:01Z) - Fine-grained Audible Video Description [61.81122862375985]
We construct the first fine-grained audible video description benchmark (FAVDBench)
For each video clip, we first provide a one-sentence summary of the video, followed by 4-6 sentences describing the visual details and 1-2 audio-related descriptions at the end.
We demonstrate that employing fine-grained video descriptions can create more intricate videos than using captions.
arXiv Detail & Related papers (2023-03-27T22:03:48Z) - Sound-Guided Semantic Video Generation [15.225598817462478]
We propose a framework to generate realistic videos by leveraging multimodal (sound-image-text) embedding space.
As sound provides the temporal contexts of the scene, our framework learns to generate a video that is semantically consistent with sound.
arXiv Detail & Related papers (2022-04-20T07:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.