Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
- URL: http://arxiv.org/abs/2502.07328v3
- Date: Tue, 06 May 2025 09:48:44 GMT
- Title: Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
- Authors: Atharva Mehta, Shivam Chauhan, Amirbek Djanibekov, Atharva Kulkarni, Gus Xia, Monojit Choudhury,
- Abstract summary: We present a study of the datasets and research papers for music generation.<n>We find that only 5.7% of the total hours of existing music datasets come from non-Western genres.
- Score: 13.568559786822457
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The advent of Music-Language Models has greatly enhanced the automatic music generation capability of AI systems, but they are also limited in their coverage of the musical genres and cultures of the world. We present a study of the datasets and research papers for music generation and quantify the bias and under-representation of genres. We find that only 5.7% of the total hours of existing music datasets come from non-Western genres, which naturally leads to disparate performance of the models across genres. We then investigate the efficacy of Parameter-Efficient Fine-Tuning (PEFT) techniques in mitigating this bias. Our experiments with two popular models -- MusicGen and Mustango, for two underrepresented non-Western music traditions -- Hindustani Classical and Turkish Makam music, highlight the promises as well as the non-triviality of cross-genre adaptation of music through small datasets, implying the need for more equitable baseline music-language models that are designed for cross-cultural transfer learning.
Related papers
- CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning [55.80320947983555]
CultureMERT-95M is a multi-culturally adapted foundation model developed to enhance cross-cultural music representation learning.<n>Training on a 650-hour multi-cultural data mix results in an average improvement of 4.9% in ROC-AUC and AP across diverse non-Western music auto-tagging tasks.<n>Task arithmetic performs on par with our multi-culturally trained model on non-Western auto-tagging tasks and shows no regression on Western datasets.
arXiv Detail & Related papers (2025-06-21T21:16:39Z) - Universal Music Representations? Evaluating Foundation Models on World Music Corpora [65.72891334156706]
Foundation models have revolutionized music information retrieval, but questions remain about their ability to generalize.<n>This paper presents a comprehensive evaluation of five state-of-the-art audio foundation models across six musical corpora.
arXiv Detail & Related papers (2025-06-20T15:06:44Z) - Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music.
This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - From West to East: Who can understand the music of the others better? [91.78564268397139]
We leverage transfer learning methods to derive insights about similarities between different music cultures.
We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music.
Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset.
arXiv Detail & Related papers (2023-07-19T07:29:14Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Transfer Learning for Underrepresented Music Generation [0.9645196221785693]
We identify Iranian folk music as an example of such an OOD genre for MusicVAE, a large generative music model.
We find that a combinational creativity transfer learning approach can efficiently adapt MusicVAE to an Iranian folk music dataset, indicating potential for generating underrepresented music genres in the future.
arXiv Detail & Related papers (2023-06-01T01:53:10Z) - A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data.
The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Personalized Popular Music Generation Using Imitation and Structure [1.971709238332434]
We propose a statistical machine learning model that is able to capture and imitate the structure, melody, chord, and bass style from a given example seed song.
An evaluation using 10 pop songs shows that our new representations and methods are able to create high-quality stylistic music.
arXiv Detail & Related papers (2021-05-10T23:43:00Z) - Modeling the Music Genre Perception across Language-Bound Cultures [10.223656553455003]
We study the feasibility of obtaining relevant cross-lingual, culture-specific music genre annotations.
We show that unsupervised cross-lingual music genre annotation is feasible with high accuracy.
We introduce a new, domain-dependent cross-lingual corpus to benchmark state of the art multilingual pre-trained embedding models.
arXiv Detail & Related papers (2020-10-13T12:20:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.