Generative Disco: Text-to-Video Generation for Music Visualization
- URL: http://arxiv.org/abs/2304.08551v2
- Date: Thu, 28 Sep 2023 16:14:54 GMT
- Title: Generative Disco: Text-to-Video Generation for Music Visualization
- Authors: Vivian Liu, Tao Long, Nathan Raw, Lydia Chilton
- Abstract summary: We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-video generation.
The system helps users visualize music in intervals by finding prompts to describe the images that intervals start and end on and interpolating between them to the beat of the music.
We introduce design patterns for improving these generated videos: transitions, which express shifts in color, time, subject, or style, and holds, which help focus the video on subjects.
- Score: 9.53563436241774
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visuals can enhance our experience of music, owing to the way they can
amplify the emotions and messages conveyed within it. However, creating music
visualization is a complex, time-consuming, and resource-intensive process. We
introduce Generative Disco, a generative AI system that helps generate music
visualizations with large language models and text-to-video generation. The
system helps users visualize music in intervals by finding prompts to describe
the images that intervals start and end on and interpolating between them to
the beat of the music. We introduce design patterns for improving these
generated videos: transitions, which express shifts in color, time, subject, or
style, and holds, which help focus the video on subjects. A study with
professionals showed that transitions and holds were a highly expressive
framework that enabled them to build coherent visual narratives. We conclude on
the generalizability of these patterns and the potential of generated video for
creative professionals.
Related papers
- MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content.
MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features.
We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z) - Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings [10.302353984541497]
This research develops a model capable of generating music that resonates with the emotions depicted in visual arts.
Addressing the scarcity of aligned art and music data, we curated the Emotion Painting Music dataset.
Our dual-stage framework converts images to text descriptions of emotional content and then transforms these descriptions into music, facilitating efficient learning with minimal data.
arXiv Detail & Related papers (2024-09-12T08:19:25Z) - MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence [62.72540590546812]
MovieDreamer is a novel hierarchical framework that integrates the strengths of autoregressive models with diffusion-based rendering.
We present experiments across various movie genres, demonstrating that our approach achieves superior visual and narrative quality.
arXiv Detail & Related papers (2024-07-23T17:17:05Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation [69.20173154096]
We develop a framework comprised of two functional modules, Motion Structure Retrieval and Structure-Guided Text-to-Video Synthesis.
For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.
For the second module, we propose a controllable video generation model that offers flexible controls over structure and characters.
arXiv Detail & Related papers (2023-07-13T17:57:13Z) - Tr\"aumerAI: Dreaming Music with StyleGAN [2.578242050187029]
We propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN.
An annotator listened to 100 music clips of 10 seconds long and selected an image that suits the music among the StyleGAN-generated examples.
The generated examples show that the mapping between audio and video makes a certain level of intra-segment similarity and inter-segment dissimilarity.
arXiv Detail & Related papers (2021-02-09T07:04:22Z) - Lets Play Music: Audio-driven Performance Video Generation [58.77609661515749]
We propose a new task named Audio-driven Per-formance Video Generation (APVG)
APVG aims to synthesize the video of a person playing a certain instrument guided by a given music audio clip.
arXiv Detail & Related papers (2020-11-05T03:13:46Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.