Intelligent Director: An Automatic Framework for Dynamic Visual
Composition using ChatGPT
- URL: http://arxiv.org/abs/2402.15746v1
- Date: Sat, 24 Feb 2024 06:58:15 GMT
- Title: Intelligent Director: An Automatic Framework for Dynamic Visual
Composition using ChatGPT
- Authors: Sixiao Zheng, Jingyang Huo, Yu Wang, Yanwei Fu
- Abstract summary: We propose the Dynamic Visual Composition (DVC) task to automatically integrate various media elements based on user requirements and create storytelling videos.
We propose an Intelligent Director framework, utilizing LENS to generate descriptions for images and video frames and combining ChatGPT to generate coherent captions.
We construct UCF101-DVC and Personal Album datasets and verified the effectiveness of our framework.
- Score: 47.40350722537004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rise of short video platforms represented by TikTok, the trend of
users expressing their creativity through photos and videos has increased
dramatically. However, ordinary users lack the professional skills to produce
high-quality videos using professional creation software. To meet the demand
for intelligent and user-friendly video creation tools, we propose the Dynamic
Visual Composition (DVC) task, an interesting and challenging task that aims to
automatically integrate various media elements based on user requirements and
create storytelling videos. We propose an Intelligent Director framework,
utilizing LENS to generate descriptions for images and video frames and
combining ChatGPT to generate coherent captions while recommending appropriate
music names. Then, the best-matched music is obtained through music retrieval.
Then, materials such as captions, images, videos, and music are integrated to
seamlessly synthesize the video. Finally, we apply AnimeGANv2 for style
transfer. We construct UCF101-DVC and Personal Album datasets and verified the
effectiveness of our framework in solving DVC through qualitative and
quantitative comparisons, along with user studies, demonstrating its
substantial potential.
Related papers
- GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions [13.9134271174972]
We present General Video-to-Music Generation model (GVMGen) for generating high-related music to the video input.
Our model employs hierarchical attentions to extract and align video features with music in both spatial and temporal dimensions.
Our method is versatile, capable of generating multi-style music from different video inputs, even in zero-shot scenarios.
arXiv Detail & Related papers (2025-01-17T06:30:11Z) - MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content.
MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features.
We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z) - VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos [32.741262543860934]
We present a framework for learning to generate background music from video inputs.
We develop a generative video-music Transformer with a novel semantic video-music alignment scheme.
New temporal video encoder architecture allows us to efficiently process videos consisting of many densely sampled frames.
arXiv Detail & Related papers (2024-09-11T17:56:48Z) - One-Shot Pose-Driving Face Animation Platform [7.422568903818486]
We refine an existing Image2Video model by integrating a Face Locator and Motion Frame mechanism.
We optimize the model using extensive human face video datasets, significantly enhancing its ability to produce high-quality talking head videos.
We develop a demo platform using the Gradio framework, which streamlines the process, enabling users to quickly create customized talking head videos.
arXiv Detail & Related papers (2024-07-12T03:09:07Z) - VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling [71.01050359126141]
We propose VidMuse, a framework for generating music aligned with video inputs.
VidMuse produces high-fidelity music that is both acoustically and semantically aligned with the video.
arXiv Detail & Related papers (2024-06-06T17:58:11Z) - Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation [69.20173154096]
We develop a framework comprised of two functional modules, Motion Structure Retrieval and Structure-Guided Text-to-Video Synthesis.
For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.
For the second module, we propose a controllable video generation model that offers flexible controls over structure and characters.
arXiv Detail & Related papers (2023-07-13T17:57:13Z) - MovieFactory: Automatic Movie Creation from Text using Large Generative
Models for Language and Images [92.13079696503803]
We present MovieFactory, a framework to generate cinematic-picture (3072$times$1280), film-style (multi-scene), and multi-modality (sounding) movies.
Our approach empowers users to create captivating movies with smooth transitions using simple text inputs.
arXiv Detail & Related papers (2023-06-12T17:31:23Z) - Generative Disco: Text-to-Video Generation for Music Visualization [9.53563436241774]
We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-video generation.
The system helps users visualize music in intervals by finding prompts to describe the images that intervals start and end on and interpolating between them to the beat of the music.
We introduce design patterns for improving these generated videos: transitions, which express shifts in color, time, subject, or style, and holds, which help focus the video on subjects.
arXiv Detail & Related papers (2023-04-17T18:44:00Z) - Dynamic Storyboard Generation in an Engine-based Virtual Environment for
Video Production [92.14891282042764]
We present Virtual Dynamic Storyboard (VDS) to allow users storyboarding shots in virtual environments.
VDS runs on a "propose-simulate-discriminate" mode: Given a formatted story script and a camera script as input, it generates several character animation and camera movement proposals.
To pick up the top-quality dynamic storyboard from the candidates, we equip it with a shot ranking discriminator based on shot quality criteria learned from professional manual-created data.
arXiv Detail & Related papers (2023-01-30T06:37:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.