Related papers: Adversarial-MidiBERT: Symbolic Music Understanding Model Based on Unbias Pre-training and Mask Fine-tuning

Adversarial-MidiBERT: Symbolic Music Understanding Model Based on Unbias Pre-training and Mask Fine-tuning

URL: http://arxiv.org/abs/2407.08306v1
Date: Thu, 11 Jul 2024 08:54:38 GMT
Title: Adversarial-MidiBERT: Symbolic Music Understanding Model Based on Unbias Pre-training and Mask Fine-tuning
Authors: Zijian Zhao,
Abstract summary: We propose Adrial-MidiBERT, a symbolic music understanding model based on Biversa Representations from Transformers (BERT) We introduce an unbiased pre-training method based on adversarial learning to minimize the participation of tokens that lead to biases during training. Furthermore, we propose a mask fine-tuning method to narrow the data gap between pre-training and fine-tuning, which can help the model converge faster and perform better.
Score: 2.61072980439312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As an important part of Music Information Retrieval (MIR), Symbolic Music Understanding (SMU) has gained substantial attention, as it can assist musicians and amateurs in learning and creating music. Recently, pre-trained language models have been widely adopted in SMU because the symbolic music shares a huge similarity with natural language, and the pre-trained manner also helps make full use of limited music data. However, the issue of bias, such as sexism, ageism, and racism, has been observed in pre-trained language models, which is attributed to the imbalanced distribution of training data. It also has a significant influence on the performance of downstream tasks, which also happens in SMU. To address this challenge, we propose Adversarial-MidiBERT, a symbolic music understanding model based on Bidirectional Encoder Representations from Transformers (BERT). We introduce an unbiased pre-training method based on adversarial learning to minimize the participation of tokens that lead to biases during training. Furthermore, we propose a mask fine-tuning method to narrow the data gap between pre-training and fine-tuning, which can help the model converge faster and perform better. We evaluate our method on four music understanding tasks, and our approach demonstrates excellent performance in all of them. The code for our model is publicly available at https://github.com/RS2002/Adversarial-MidiBERT.

Related papers

Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach [2.894273842598117]
We propose the first Vision Language Model (VLM)-based Image-to-Music (I2M) framework that offers high interpretability and low computational cost.<n>Specifically, we utilize ABC notation to bridge the text and music modalities, enabling the VLM to generate music using natural language.<n>We then apply multi-modal Retrieval-Augmented Generation (RAG) and self-refinement techniques to allow the VLM to produce high-quality music without external training.
arXiv Detail & Related papers (2025-09-26T14:07:29Z)
Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z)
Towards Explainable and Interpretable Musical Difficulty Estimation: A Parameter-efficient Approach [49.2787113554916]
Estimating music piece difficulty is important for organizing educational music collections. Our work employs explainable descriptors for difficulty estimation in symbolic music representations. Our approach, evaluated in piano repertoire categorized in 9 classes, achieved 41.4% accuracy independently, with a mean squared error (MSE) of 1.7.
arXiv Detail & Related papers (2024-08-01T11:23:42Z)
An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging [6.363158395541767]
Self-supervised learning has emerged as a powerful way to pre-train generalizable machine learning models on large amounts of unlabeled data. In this study, we investigate and compare the performance of new self-supervised methods for music tagging.
arXiv Detail & Related papers (2024-04-14T07:56:08Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
Bootstrap Masked Visual Modeling via Hard Patches Mining [68.74750345823674]
Masked visual modeling has attracted much attention due to its promising potential in learning generalizable representations. We argue that it is equally important for the model to stand in the shoes of a teacher to produce challenging problems by itself. To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.
arXiv Detail & Related papers (2023-12-21T10:27:52Z)
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z)
Masked and Permuted Implicit Context Learning for Scene Text Recognition [8.742571493814326]
Scene Recognition (STR) is difficult because of variations in text styles, shapes, and backgrounds. We propose a masked and permuted implicit context learning network for STR, within a single decoder.
arXiv Detail & Related papers (2023-05-25T15:31:02Z)
Self-Evolution Learning for Discriminative Language Model Pretraining [103.57103957631067]
Self-Evolution learning (SE) is a simple and effective token masking and learning method. SE focuses on learning the informative yet under-explored tokens and adaptively regularizes the training by introducing a novel Token-specific Label Smoothing approach.
arXiv Detail & Related papers (2023-05-24T16:00:54Z)
Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning. We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Supervised and Unsupervised Learning of Audio Representations for Music Understanding [9.239657838690226]
We show how the domain of pre-training datasets affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-10-07T20:07:35Z)
Learning Better Masking for Better Language Model Pre-training [80.31112722910787]
Masked Language Modeling has been widely used as denoising objective in pre-training language models (PrLMs) PrLMs commonly adopt a Random-Token Masking strategy where a fixed masking ratio is applied and different contents are masked by an equal probability throughout the entire training. We propose two scheduled masking approaches that adaptively tune the masking ratio and masked content in different training stages.
arXiv Detail & Related papers (2022-08-23T08:27:52Z)
Contrastive Learning with Positive-Negative Frame Mask for Music Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR. We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z)
Learning music audio representations via weak language supervision [14.335950077921435]
We design a multimodal architecture for music and language pre-training (MuLaP) optimised via a set of proxy tasks. weak supervision is provided in the form of noisy natural language descriptions conveying the overall musical content of the track. We demonstrate the usefulness of our approach by comparing the performance of audio representations produced by the same audio backbone with different training strategies.
arXiv Detail & Related papers (2021-12-08T10:30:52Z)
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data. MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)
Contrastive Learning of Musical Representations [0.0]
We introduce SimCLR to the music domain to form a framework for self-supervised learning of raw waveforms of music: CLMR. We show that CLMR's representations are transferable using out-of-domain datasets, indicating that they capture important musical knowledge. To foster and future research on self-supervised learning in music, we publicly release the pre-trained models and the source code of all experiments of this paper on GitHub.
arXiv Detail & Related papers (2021-03-17T02:53:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.