MARBLE: Music Audio Representation Benchmark for Universal Evaluation
- URL: http://arxiv.org/abs/2306.10548v4
- Date: Thu, 23 Nov 2023 10:31:17 GMT
- Title: MARBLE: Music Audio Representation Benchmark for Universal Evaluation
- Authors: Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin,
Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang,
Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger
Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike
Guo, Jie Fu
- Abstract summary: We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
- Score: 79.25065218663458
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In the era of extensive intersection between art and Artificial Intelligence
(AI), such as image generation and fiction co-creation, AI for music remains
relatively nascent, particularly in music understanding. This is evident in the
limited work on deep music representations, the scarcity of large-scale
datasets, and the absence of a universal and community-driven benchmark. To
address this issue, we introduce the Music Audio Representation Benchmark for
universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various
Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy
with four hierarchy levels, including acoustic, performance, score, and
high-level description. We then establish a unified protocol based on 14 tasks
on 8 public-available datasets, providing a fair and standard assessment of
representations of all open-sourced pre-trained models developed on music
recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and
reproducible suite for the community, with a clear statement on copyright
issues on datasets. Results suggest recently proposed large-scale pre-trained
musical language models perform the best in most tasks, with room for further
improvement. The leaderboard and toolkit repository are published at
https://marble-bm.shef.ac.uk to promote future music AI research.
Related papers
- Benchmarking Sub-Genre Classification For Mainstage Dance Music [6.042939894766715]
This work introduces a novel benchmark comprising a new dataset and a baseline.
Our dataset extends the number of sub-genres to cover most recent mainstage live sets by top DJs worldwide in music festivals.
For the baseline, we developed deep learning models that outperform current state-of-the-art multimodel language models.
arXiv Detail & Related papers (2024-09-10T17:54:00Z) - The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models [63.53530525014976]
ZIQI-Eval is a benchmark specifically designed to evaluate the music-related capabilities of large language models (LLMs)
ZIQI-Eval encompasses a wide range of questions, covering 10 major categories and 56 subcategories, resulting in over 14,000 meticulously curated data entries.
Results indicate that all LLMs perform poorly on the ZIQI-Eval benchmark, suggesting significant room for improvement in their musical capabilities.
arXiv Detail & Related papers (2024-06-22T16:24:42Z) - MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing [3.3162176082220975]
We present the MOSA (Music mOtion with Semantic ) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians.
To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date.
arXiv Detail & Related papers (2024-06-10T15:37:46Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response [42.73982391253872]
MusiLingo is a novel system for music caption generation and music-related query responses.
We train it on an extensive music caption dataset and fine-tune it with instructional data.
Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs.
arXiv Detail & Related papers (2023-09-15T19:31:40Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data.
The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z) - MATT: A Multiple-instance Attention Mechanism for Long-tail Music Genre
Classification [1.8275108630751844]
Imbalanced music genre classification is a crucial task in the Music Information Retrieval (MIR) field.
Most of the existing models are designed for class-balanced music datasets.
We propose a novel mechanism named Multi-instance Attention (MATT) to boost the performance for identifying tail classes.
arXiv Detail & Related papers (2022-09-09T03:52:44Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Contrastive Learning of Musical Representations [0.0]
We introduce SimCLR to the music domain to form a framework for self-supervised learning of raw waveforms of music: CLMR.
We show that CLMR's representations are transferable using out-of-domain datasets, indicating that they capture important musical knowledge.
To foster and future research on self-supervised learning in music, we publicly release the pre-trained models and the source code of all experiments of this paper on GitHub.
arXiv Detail & Related papers (2021-03-17T02:53:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.