Universal Music Representations? Evaluating Foundation Models on World Music Corpora
- URL: http://arxiv.org/abs/2506.17055v1
- Date: Fri, 20 Jun 2025 15:06:44 GMT
- Title: Universal Music Representations? Evaluating Foundation Models on World Music Corpora
- Authors: Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos,
- Abstract summary: Foundation models have revolutionized music information retrieval, but questions remain about their ability to generalize.<n>This paper presents a comprehensive evaluation of five state-of-the-art audio foundation models across six musical corpora.
- Score: 65.72891334156706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models have revolutionized music information retrieval, but questions remain about their ability to generalize across diverse musical traditions. This paper presents a comprehensive evaluation of five state-of-the-art audio foundation models across six musical corpora spanning Western popular, Greek, Turkish, and Indian classical traditions. We employ three complementary methodologies to investigate these models' cross-cultural capabilities: probing to assess inherent representations, targeted supervised fine-tuning of 1-2 layers, and multi-label few-shot learning for low-resource scenarios. Our analysis shows varying cross-cultural generalization, with larger models typically outperforming on non-Western music, though results decline for culturally distant traditions. Notably, our approaches achieve state-of-the-art performance on five out of six evaluated datasets, demonstrating the effectiveness of foundation models for world music understanding. We also find that our targeted fine-tuning approach does not consistently outperform probing across all settings, suggesting foundation models already encode substantial musical knowledge. Our evaluation framework and benchmarking results contribute to understanding how far current models are from achieving universal music representations while establishing metrics for future progress.
Related papers
- Advancing the Foundation Model for Music Understanding [9.210248657997687]
We introduce a unified foundation model named MuFun for holistic music understanding.<n>Our model features a novel architecture that jointly processes instrumental and lyrical content.<n>We also propose a new benchmark for multi-faceted music understanding called MuCUE.
arXiv Detail & Related papers (2025-08-02T03:33:47Z) - CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning [55.80320947983555]
CultureMERT-95M is a multi-culturally adapted foundation model developed to enhance cross-cultural music representation learning.<n>Training on a 650-hour multi-cultural data mix results in an average improvement of 4.9% in ROC-AUC and AP across diverse non-Western music auto-tagging tasks.<n>Task arithmetic performs on par with our multi-culturally trained model on non-Western auto-tagging tasks and shows no regression on Western datasets.
arXiv Detail & Related papers (2025-06-21T21:16:39Z) - From Generality to Mastery: Composer-Style Symbolic Music Generation via Large-Scale Pre-training [4.7205815347741185]
We investigate how general music knowledge learned from a broad corpus can enhance the mastery of specific composer styles.<n>First, we pre-train a REMI-based music generation model on a large corpus of pop, folk, and classical music.<n>Then, we fine-tune it on a small, human-verified dataset from four renowned composers, namely Bach, Mozart, Beethoven, and Chopin.
arXiv Detail & Related papers (2025-06-20T22:20:59Z) - CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation [61.130639734982395]
We introduce CAIRe, a novel evaluation metric that assesses the degree of cultural relevance of an image.<n>Our framework grounds entities and concepts in the image to a knowledge base and uses factual information to give independent graded judgments for each culture label.
arXiv Detail & Related papers (2025-06-10T17:16:23Z) - Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models [13.568559786822457]
We present a study of the datasets and research papers for music generation.<n>We find that only 5.7% of the total hours of existing music datasets come from non-Western genres.
arXiv Detail & Related papers (2025-02-11T07:46:29Z) - Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music.
This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z) - From West to East: Who can understand the music of the others better? [91.78564268397139]
We leverage transfer learning methods to derive insights about similarities between different music cultures.
We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music.
Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset.
arXiv Detail & Related papers (2023-07-19T07:29:14Z) - A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data.
The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.