Related papers: A Survey of Foundation Models for Music Understanding

A Survey of Foundation Models for Music Understanding

URL: http://arxiv.org/abs/2409.09601v1
Date: Sun, 15 Sep 2024 03:34:14 GMT
Title: A Survey of Foundation Models for Music Understanding
Authors: Wenjun Li, Ying Cai, Ziyang Wu, Wenyi Zhang, Yifan Chen, Rundong Qi, Mengqi Dong, Peigen Chen, Xiao Dong, Fenghao Shi, Lei Guo, Junwei Han, Bao Ge, Tianming Liu, Lin Gan, Tuo Zhang,
Abstract summary: This work is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities.
Score: 60.83532699497597
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting us personally, socially, and culturally. A better understanding of music can enhance our emotions, cognitive skills, and cultural connections. The rapid advancement of artificial intelligence (AI) has introduced new ways to analyze music, aiming to replicate human understanding of music and provide related services. While the traditional models focused on audio features and simple tasks, the recent development of large language models (LLMs) and foundation models (FMs), which excel in various fields by integrating semantic information and demonstrating strong reasoning abilities, could capture complex musical features and patterns, integrate music with language and incorporate rich musical, emotional and psychological knowledge. Therefore, they have the potential in handling complex music understanding tasks from a semantic perspective, producing outputs closer to human perception. This work, to our best knowledge, is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities. We also discussed their limitations and proposed possible future directions, offering insights for researchers in this field.

Related papers

Generating Mixcode Popular Songs with Artificial Intelligence: Concepts, Plans, and Speculations [0.0]
This paper discusses a proposed project integrating artificial intelligence and popular music. The ultimate goal of the project is to create a powerful tool for implementing music for social transformation, education, healthcare, and emotional well-being.
arXiv Detail & Related papers (2024-11-10T10:49:13Z)
Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z)
Are Words Enough? On the semantic conditioning of affective music generation [1.534667887016089]
This scoping review aims to analyze and discuss the possibilities of music generation conditioned by emotions. In detail, we review two main paradigms adopted in automatic music generation: rules-based and machine-learning models. We conclude that overcoming the limitation and ambiguity of language to express emotions through music has the potential to impact the creative industries.
arXiv Detail & Related papers (2023-11-07T00:19:09Z)
A Survey on Artificial Intelligence for Music Generation: Agents, Domains and Perspectives [10.349825060515181]
We describe how humans compose music and how new AI systems could imitate such process. To understand how AI models and algorithms generate music, we explore, analyze and describe the agents that take part of the music generation process.
arXiv Detail & Related papers (2022-10-25T11:54:30Z)
Music Harmony Generation, through Deep Learning and Using a Multi-Objective Evolutionary Algorithm [0.0]
This paper introduces a genetic multi-objective evolutionary optimization algorithm for the generation of polyphonic music. One of the goals is the rules and regulations of music, which, along with the other two goals, including the scores of music experts and ordinary listeners, fits the cycle of evolution to get the most optimal response. The results show that the proposed method is able to generate difficult and pleasant pieces with desired styles and lengths, along with harmonic sounds that follow the grammar while attracting the listener, at the same time.
arXiv Detail & Related papers (2021-02-16T05:05:54Z)
Artificial Musical Intelligence: A Survey [51.477064918121336]
Music has become an increasingly prevalent domain of machine learning and artificial intelligence research. This article provides a definition of musical intelligence, introduces a taxonomy of its constituent components, and surveys the wide range of AI methods that can be, and have been, brought to bear in its pursuit.
arXiv Detail & Related papers (2020-06-17T04:46:32Z)
Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone. The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.