A Multimodal Symphony: Integrating Taste and Sound through Generative AI
- URL: http://arxiv.org/abs/2503.02823v1
- Date: Tue, 04 Mar 2025 17:48:48 GMT
- Title: A Multimodal Symphony: Integrating Taste and Sound through Generative AI
- Authors: Matteo Spanio, Massimiliano Zampini, Antonio Rodà , Franco Pierucci,
- Abstract summary: This article explores multimodal generative models capable of converting taste information into music.<n>We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece.
- Score: 1.2749527861829049
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent decades, neuroscientific and psychological research has traced direct relationships between taste and auditory perceptions. This article explores multimodal generative models capable of converting taste information into music, building on this foundational research. We provide a brief review of the state of the art in this field, highlighting key findings and methodologies. We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece. The results are promising: according the participants' ($n=111$) evaluation, the fine-tuned model produces music that more coherently reflects the input taste descriptions compared to the non-fine-tuned model. This study represents a significant step towards understanding and developing embodied interactions between AI, sound, and taste, opening new possibilities in the field of generative AI. We release our dataset, code and pre-trained model at: https://osf.io/xs5jy/.
Related papers
- Prevailing Research Areas for Music AI in the Era of Foundation Models [8.067636023395236]
There has been a surge of generative music AI applications within the past few years.
We discuss the current state of music datasets and their limitations.
We highlight applications of these generative models towards extensions to multiple modalities and integration with artists' workflow.
arXiv Detail & Related papers (2024-09-14T09:06:43Z) - Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music.
This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z) - Between the AI and Me: Analysing Listeners' Perspectives on AI- and Human-Composed Progressive Metal Music [1.2874569408514918]
We explore participants' perspectives on AI- vs human-generated progressive metal, using rock music as a control group.
We propose a mixed methods approach to assess the effects of generation type (human vs. AI), genre (progressive metal vs. rock), and curation process (random vs. cherry-picked)
Our findings validate the use of fine-tuning to achieve genre-specific specialization in AI music generation.
Despite some AI-generated excerpts receiving similar ratings to human music, listeners exhibited a preference for human compositions.
arXiv Detail & Related papers (2024-07-31T14:03:45Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - StemGen: A music generation model that listens [9.489938613869864]
We present an alternative paradigm for producing music generation models that can listen and respond to musical context.
We describe how such a model can be constructed using a non-autoregressive, transformer-based model architecture.
The resulting model reaches the audio quality of state-of-the-art text-conditioned models, as well as exhibiting strong musical coherence with its context.
arXiv Detail & Related papers (2023-12-14T08:09:20Z) - Exploring Variational Auto-Encoder Architectures, Configurations, and
Datasets for Generative Music Explainable AI [7.391173255888337]
Generative AI models for music and the arts are increasingly complex and hard to understand.
One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models.
This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE) have on music generation performance.
arXiv Detail & Related papers (2023-11-14T17:27:30Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Learning to Generate Music With Sentiment [1.8275108630751844]
This paper presents a generative Deep Learning model that can be directed to compose music with a given sentiment.
Besides music generation, the same model can be used for sentiment analysis of symbolic music.
arXiv Detail & Related papers (2021-03-09T03:16:52Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.