Related papers: From West to East: Who can understand the music of the others better?

From West to East: Who can understand the music of the others better?

URL: http://arxiv.org/abs/2307.09795v1
Date: Wed, 19 Jul 2023 07:29:14 GMT
Title: From West to East: Who can understand the music of the others better?
Authors: Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos
Abstract summary: We leverage transfer learning methods to derive insights about similarities between different music cultures. We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music. Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset.
Score: 91.78564268397139
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent developments in MIR have led to several benchmark deep learning models whose embeddings can be used for a variety of downstream tasks. At the same time, the vast majority of these models have been trained on Western pop/rock music and related styles. This leads to research questions on whether these models can be used to learn representations for different music cultures and styles, or whether we can build similar music audio embedding models trained on data from different cultures or styles. To that end, we leverage transfer learning methods to derive insights about the similarities between the different music cultures to which the data belongs to. We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music. Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset. Experimental results show that competitive performance is achieved in all domains via transfer learning, while the best source dataset varies for each music culture. The implementation and the trained models are both provided in a public repository.

Related papers

CultureMERT: Continual Pre-Training for Cross-Cultural Music Representation Learning [55.80320947983555]
CultureMERT-95M is a multi-culturally adapted foundation model developed to enhance cross-cultural music representation learning.<n>Training on a 650-hour multi-cultural data mix results in an average improvement of 4.9% in ROC-AUC and AP across diverse non-Western music auto-tagging tasks.<n>Task arithmetic performs on par with our multi-culturally trained model on non-Western auto-tagging tasks and shows no regression on Western datasets.
arXiv Detail & Related papers (2025-06-21T21:16:39Z)
Universal Music Representations? Evaluating Foundation Models on World Music Corpora [65.72891334156706]
Foundation models have revolutionized music information retrieval, but questions remain about their ability to generalize.<n>This paper presents a comprehensive evaluation of five state-of-the-art audio foundation models across six musical corpora.
arXiv Detail & Related papers (2025-06-20T15:06:44Z)
Learning Music Audio Representations With Limited Data [10.843118411238034]
We investigate the behavior of several music audio representation models under limited-data learning regimes.<n>We consider music models with various architectures, training paradigms, and input durations, and train them on data collections ranging from 5 to 8,000 minutes long.<n>We evaluate the learned representations on various music information retrieval tasks and analyze their robustness to noise.
arXiv Detail & Related papers (2025-05-09T13:39:53Z)
Music for All: Exploring Multicultural Representations in Music Generation Models [13.568559786822457]
We present a study of the datasets and research papers for music generation. We find that only 5.7% of the total hours of existing music datasets come from non-Western genres.
arXiv Detail & Related papers (2025-02-11T07:46:29Z)
Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z)
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models [11.834712543531756]
MuChoMusic is a benchmark for evaluating music understanding in multimodal language models focused on audio. It comprises 1,187 multiple-choice questions, all validated by human annotators, on 644 music tracks sourced from two publicly available music datasets. We evaluate five open-source models and identify several pitfalls, including an over-reliance on the language modality.
arXiv Detail & Related papers (2024-08-02T15:34:05Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
WikiMuTe: A web-sourced dataset of semantic descriptions for music audio [7.4327407361824935]
We present WikiMuTe, a new and open dataset containing rich semantic descriptions of music. The data is sourced from Wikipedia's rich catalogue of articles covering musical works. We train a model that jointly learns text and audio representations and performs cross-modal retrieval.
arXiv Detail & Related papers (2023-12-14T18:38:02Z)
A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data. The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z)
Contrastive Audio-Language Learning for Music [13.699088044513562]
MusCALL is a framework for Music Contrastive Audio-Language Learning. Our approach consists of a dual-encoder architecture that learns the alignment between pairs of music audio and descriptive sentences.
arXiv Detail & Related papers (2022-08-25T16:55:15Z)
Listener Modeling and Context-aware Music Recommendation Based on Country Archetypes [10.19712238203935]
Music preferences are strongly shaped by the cultural and socio-economic background of the listener. We use state-of-the-art unsupervised learning techniques to investigate country profiles of music preferences on the fine-grained level of music tracks. We propose a context-aware music recommendation system that leverages implicit user feedback.
arXiv Detail & Related papers (2020-09-11T17:59:04Z)
dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains. This will also provide a means for evaluating algorithms specifically designed for music. The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)
Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music. We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.