Related papers: Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings

Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings

URL: http://arxiv.org/abs/2409.07827v1
Date: Thu, 12 Sep 2024 08:19:25 GMT
Title: Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings
Authors: Tanisha Hisariya, Huan Zhang, Jinhua Liang,
Abstract summary: This research develops a model capable of generating music that resonates with the emotions depicted in visual arts. Addressing the scarcity of aligned art and music data, we curated the Emotion Painting Music dataset. Our dual-stage framework converts images to text descriptions of emotional content and then transforms these descriptions into music, facilitating efficient learning with minimal data.
Score: 10.302353984541497
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Rapid advancements in artificial intelligence have significantly enhanced generative tasks involving music and images, employing both unimodal and multimodal approaches. This research develops a model capable of generating music that resonates with the emotions depicted in visual arts, integrating emotion labeling, image captioning, and language models to transform visual inputs into musical compositions. Addressing the scarcity of aligned art and music data, we curated the Emotion Painting Music Dataset, pairing paintings with corresponding music for effective training and evaluation. Our dual-stage framework converts images to text descriptions of emotional content and then transforms these descriptions into music, facilitating efficient learning with minimal data. Performance is evaluated using metrics such as Fr\'echet Audio Distance (FAD), Total Harmonic Distortion (THD), Inception Score (IS), and KL divergence, with audio-emotion text similarity confirmed by the pre-trained CLAP model to demonstrate high alignment between generated music and text. This synthesis tool bridges visual art and music, enhancing accessibility for the visually impaired and opening avenues in educational and therapeutic applications by providing enriched multi-sensory experiences.

Related papers

Art2Mus: Artwork-to-Music Generation via Visual Conditioning and Large-Scale Cross-Modal Alignment [8.468469176803241]
We introduce ArtSound, a large-scale dataset of 105,884 artwork-music pairs enriched with dual-modality captions.<n>We propose ArtToMus, the first framework explicitly designed for direct artwork-to-music generation.<n>ArtToMus maps digitized artworks to music without image-to-text translation or language-based semantic supervision.
arXiv Detail & Related papers (2026-02-19T18:23:58Z)
Mozualization: Crafting Music and Visual Representation with Multimodal AI [11.229032883997748]
Mozualization is a music generation and editing tool that creates multi-style embedded music by integrating diverse inputs. Our work is inspired by the ways people express their emotions -- writing mood-descriptive poems or articles, creating drawings with warm or cool tones, or listening to sad or uplifting music.
arXiv Detail & Related papers (2025-04-05T08:22:20Z)
Aesthetic Matters in Music Perception for Image Stylization: A Emotion-driven Music-to-Visual Manipulation [13.052429836407052]
EmoMV is an emotion-driven music-to-visual manipulation method. We evaluate EmoMV using a multi-scale framework that includes image quality metrics, aesthetic assessments, and EEG measurements. Our results demonstrate that EmoMV effectively translates music's emotional content into visually compelling images.
arXiv Detail & Related papers (2025-01-03T08:41:53Z)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data. Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge. We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z)
Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise [4.9485163144728235]
This work presents a novel method for composing and improvising music inspired by Cornelius Cardew's Treatise. By leveraging OpenAI's ChatGPT to interpret the abstract visual elements of Treatise, we convert these graphical images into descriptive textual prompts. These prompts are then input into MusicLDM, a pre-trained latent diffusion model designed for music generation.
arXiv Detail & Related papers (2024-12-12T05:08:36Z)
A Survey of Foundation Models for Music Understanding [60.83532699497597]
This work is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities.
arXiv Detail & Related papers (2024-09-15T03:34:14Z)
Emotion Manipulation Through Music -- A Deep Learning Interactive Visual Approach [0.0]
We introduce a novel way to manipulate the emotional content of a song using AI tools. Our goal is to achieve the desired emotion while leaving the original melody as intact as possible. This research may contribute to on-demand custom music generation, the automated remixing of existing work, and music playlists tuned for emotional progression.
arXiv Detail & Related papers (2024-06-12T20:12:29Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
Generative Disco: Text-to-Video Generation for Music Visualization [9.53563436241774]
We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-video generation. The system helps users visualize music in intervals by finding prompts to describe the images that intervals start and end on and interpolating between them to the beat of the music. We introduce design patterns for improving these generated videos: transitions, which express shifts in color, time, subject, or style, and holds, which help focus the video on subjects.
arXiv Detail & Related papers (2023-04-17T18:44:00Z)
A Novel Multi-Task Learning Method for Symbolic Music Emotion Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML. In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z)
Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space [80.49156615923106]
Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger. Existing emotion-based image and music matching methods either employ limited categorical emotion states or train the matching model using an impractical multi-stage pipeline. In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space.
arXiv Detail & Related papers (2020-08-22T20:12:23Z)
Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music. We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone. The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.