Bias beyond Borders: Global Inequalities in AI-Generated Music
- URL: http://arxiv.org/abs/2510.01963v1
- Date: Thu, 02 Oct 2025 12:33:10 GMT
- Title: Bias beyond Borders: Global Inequalities in AI-Generated Music
- Authors: Ahmet Solak, Florian Grötschla, Luca A. Lanzendörfer, Roger Wattenhofer,
- Abstract summary: GlobalDISCO is a large-scale dataset consisting of 73k music tracks generated by state-of-the-art commercial generative music models.<n>The dataset spans 147 languages and includes musical style prompts extracted from MusicBrainz and Wikipedia.<n>The dataset is globally balanced, representing musical styles from artists across 79 countries and five continents.
- Score: 39.80452596611506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While recent years have seen remarkable progress in music generation models, research on their biases across countries, languages, cultures, and musical genres remains underexplored. This gap is compounded by the lack of datasets and benchmarks that capture the global diversity of music. To address these challenges, we introduce GlobalDISCO, a large-scale dataset consisting of 73k music tracks generated by state-of-the-art commercial generative music models, along with paired links to 93k reference tracks in LAION-DISCO-12M. The dataset spans 147 languages and includes musical style prompts extracted from MusicBrainz and Wikipedia. The dataset is globally balanced, representing musical styles from artists across 79 countries and five continents. Our evaluation reveals large disparities in music quality and alignment with reference music between high-resource and low-resource regions. Furthermore, we find marked differences in model performance between mainstream and geographically niche genres, including cases where models generate music for regional genres that more closely align with the distribution of mainstream styles.
Related papers
- Universal Music Representations? Evaluating Foundation Models on World Music Corpora [65.72891334156706]
Foundation models have revolutionized music information retrieval, but questions remain about their ability to generalize.<n>This paper presents a comprehensive evaluation of five state-of-the-art audio foundation models across six musical corpora.
arXiv Detail & Related papers (2025-06-20T15:06:44Z) - SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling [0.0]
To the best of our knowledge, there are no open-source high-quality datasets representing popular and well-known songs for generative music modeling tasks.<n>Our dataset changes this narrative and provides a dataset that is constructed using actual popular music and world-renowned artists.
arXiv Detail & Related papers (2025-06-17T08:08:08Z) - Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models [13.568559786822457]
We present a study of the datasets and research papers for music generation.<n>We find that only 5.7% of the total hours of existing music datasets come from non-Western genres.
arXiv Detail & Related papers (2025-02-11T07:46:29Z) - Missing Melodies: AI Music Generation and its "Nearly" Complete Omission of the Global South [14.147521533363028]
We conducted an analysis of over one million hours of audio datasets used in AI music generation research.<n>We identified a critical gap in the fair representation and inclusion of the musical genres of the Global South in AI research.<n>Around 40% of these datasets include some form of non-Western music, genres from the Global South account for only 14.6% of the data.
arXiv Detail & Related papers (2024-12-05T12:10:42Z) - Benchmarking Sub-Genre Classification For Mainstage Dance Music [6.042939894766715]
We introduce a novel benchmark featuring a new dataset and baseline.<n>Our dataset expands the scope of sub-genres to reflect the diversity of recent mainstage live sets performed by leading DJs at global music festivals.<n>This benchmark supports applications such as music recommendation, DJ set curation, and interactive multimedia systems, with video demos provided.
arXiv Detail & Related papers (2024-09-10T17:54:00Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - From West to East: Who can understand the music of the others better? [91.78564268397139]
We leverage transfer learning methods to derive insights about similarities between different music cultures.
We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music.
Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset.
arXiv Detail & Related papers (2023-07-19T07:29:14Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data.
The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.