Related papers: Prevailing Research Areas for Music AI in the Era of Foundation Models

Prevailing Research Areas for Music AI in the Era of Foundation Models

URL: http://arxiv.org/abs/2409.09378v3
Date: Tue, 04 Nov 2025 04:47:24 GMT
Title: Prevailing Research Areas for Music AI in the Era of Foundation Models
Authors: Megan Wei, Mateusz Modrzejewski, Aswin Sivaraman, Dorien Herremans,
Abstract summary: As AI-generated and AI-augmented music become increasingly mainstream, many researchers in the music AI community may wonder: what research frontiers remain unexplored?<n>This paper outlines several key areas within music AI research that present significant opportunities for further investigation.
Score: 10.245601263106844
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Parallel to rapid advancements in foundation model research, the past few years have witnessed a surge in music AI applications. As AI-generated and AI-augmented music become increasingly mainstream, many researchers in the music AI community may wonder: what research frontiers remain unexplored? This paper outlines several key areas within music AI research that present significant opportunities for further investigation. We begin by examining foundational representation models and highlight emerging efforts toward explainability and interpretability. We then discuss the evolution toward multimodal systems, provide an overview of the current landscape of music datasets and their limitations, and address the growing importance of model efficiency in both training and deployment. Next, we explore applied directions, focusing first on generative models. We review recent systems, their computational constraints, and persistent challenges related to evaluation and controllability. We then examine extensions of these generative approaches to multimodal settings and their integration into artists' workflows, including applications in music editing, captioning, production, transcription, source separation, performance, discovery, and education. Finally, we explore copyright implications of generative music and propose strategies to safeguard artist rights. While not exhaustive, this survey aims to illuminate promising research directions enabled by recent developments in music foundation models.

Related papers

Twenty-Five Years of MIR Research: Achievements, Practices, Evaluations, and Future Challenges [68.49490211993141]
We trace the evolution of Music Information Retrieval (MIR) over the past 25 years.<n>MIR gathers all kinds of research related to music informatics.<n>We review a set of successful practices that fuel the rapid development of MIR research.
arXiv Detail & Related papers (2025-11-10T15:32:23Z)
Real Deep Research for AI, Robotics and Beyond [85.87181330763548]
We present Real Deep Research (RDR) a comprehensive framework applied to the domains of AI and robotics.<n>The main paper details the construction of the RDR pipeline, while the appendix provides extensive results across each analyzed topic.
arXiv Detail & Related papers (2025-10-23T17:59:05Z)
Vision-to-Music Generation: A Survey [10.993775589904251]
Vision-to-music generation shows vast application prospects in fields such as film scoring, short video creation, and dance music synthesis. Research in vision-to-music is still in its preliminary stage due to its complex internal structure and the difficulty of modeling dynamic relationships with video. Existing surveys focus on general music generation without comprehensive discussion on vision-to-music.
arXiv Detail & Related papers (2025-03-27T08:21:54Z)
A Multimodal Symphony: Integrating Taste and Sound through Generative AI [1.2749527861829049]
This article explores multimodal generative models capable of converting taste information into music. We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece.
arXiv Detail & Related papers (2025-03-04T17:48:48Z)
A Survey of Foundation Models for Music Understanding [60.83532699497597]
This work is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities.
arXiv Detail & Related papers (2024-09-15T03:34:14Z)
Applications and Advances of Artificial Intelligence in Music Generation:A Review [0.04551615447454769]
This paper provides a systematic review of the latest research advancements in AI music generation. It covers key technologies, models, datasets, evaluation methods, and their practical applications across various fields.
arXiv Detail & Related papers (2024-09-03T13:50:55Z)
Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z)
Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations [52.11801730860999]
In recent years, the robot learning community has shown increasing interest in using deep generative models to capture the complexity of large datasets. We present the different types of models that the community has explored, such as energy-based models, diffusion models, action value maps, or generative adversarial networks. We also present the different types of applications in which deep generative models have been used, from grasp generation to trajectory generation or cost learning.
arXiv Detail & Related papers (2024-08-08T11:34:31Z)
Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives [10.16399860867284]
The emergence of Generative Artificial Intelligence (AI) and Large Language Models (LLMs) has marked a new era of Natural Language Processing (NLP) This paper explores the current state of these cutting-edge technologies, demonstrating their remarkable advancements and wide-ranging applications.
arXiv Detail & Related papers (2024-07-20T18:48:35Z)
Reducing Barriers to the Use of Marginalised Music Genres in AI [7.140590440016289]
This project aims to explore the eXplainable AI (XAI) challenges and opportunities associated with reducing barriers to using marginalised genres of music with AI models. XAI opportunities identified included topics of improving transparency and control of AI models, explaining the ethics and bias of AI models, fine tuning large models with small datasets to reduce bias, and explaining style-transfer opportunities with AI models. We are now building on this project to bring together a global International Responsible AI Music community and invite people to join our network.
arXiv Detail & Related papers (2024-07-18T12:10:04Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
Deepfake Generation and Detection: A Benchmark and Survey [134.19054491600832]
Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions. This survey comprehensively reviews the latest developments in deepfake generation and detection. We focus on researching four representative deepfake fields: face swapping, face reenactment, talking face generation, and facial attribute editing.
arXiv Detail & Related papers (2024-03-26T17:12:34Z)
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond [84.95530356322621]
This survey presents a systematic review of the advancements in code intelligence.<n>It covers over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works.<n>Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence.
arXiv Detail & Related papers (2024-03-21T08:54:56Z)
Exploring Variational Auto-Encoder Architectures, Configurations, and Datasets for Generative Music Explainable AI [7.391173255888337]
Generative AI models for music and the arts are increasingly complex and hard to understand. One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models. This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE) have on music generation performance.
arXiv Detail & Related papers (2023-11-14T17:27:30Z)
An Autoethnographic Exploration of XAI in Algorithmic Composition [7.775986202112564]
This paper introduces an autoethnographic study of the use of the MeasureVAE generative music XAI model with interpretable latent dimensions trained on Irish music. Findings suggest that the exploratory nature of the music-making workflow foregrounds musical features of the training dataset rather than features of the generative model itself.
arXiv Detail & Related papers (2023-08-11T12:03:17Z)
MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z)
A Survey on Artificial Intelligence for Music Generation: Agents, Domains and Perspectives [10.349825060515181]
We describe how humans compose music and how new AI systems could imitate such process. To understand how AI models and algorithms generate music, we explore, analyze and describe the agents that take part of the music generation process.
arXiv Detail & Related papers (2022-10-25T11:54:30Z)
Artificial Musical Intelligence: A Survey [51.477064918121336]
Music has become an increasingly prevalent domain of machine learning and artificial intelligence research. This article provides a definition of musical intelligence, introduces a taxonomy of its constituent components, and surveys the wide range of AI methods that can be, and have been, brought to bear in its pursuit.
arXiv Detail & Related papers (2020-06-17T04:46:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.