MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
- URL: http://arxiv.org/abs/2403.10024v1
- Date: Fri, 15 Mar 2024 05:13:38 GMT
- Title: MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
- Authors: Hao Hao Tan, Kin Wai Cheuk, Taemin Cho, Wei-Hsiang Liao, Yuki Mitsufuji,
- Abstract summary: This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA) token-based multi-instrument automatic music transcription (AMT) model.
We propose MR-MT3, with enhancements including a memory retention mechanism, prior token sampling, and token shuffling.
These methods are evaluated on the Slakh2100 dataset, demonstrating improved onset F1 scores and reduced instrument leakage.
- Score: 15.856435702348977
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA) token-based multi-instrument automatic music transcription (AMT) model. Despite SOTA performance, MT3 has the issue of instrument leakage, where transcriptions are fragmented across different instruments. To mitigate this, we propose MR-MT3, with enhancements including a memory retention mechanism, prior token sampling, and token shuffling are proposed. These methods are evaluated on the Slakh2100 dataset, demonstrating improved onset F1 scores and reduced instrument leakage. In addition to the conventional multi-instrument transcription F1 score, new metrics such as the instrument leakage ratio and the instrument detection F1 score are introduced for a more comprehensive assessment of transcription quality. The study also explores the issue of domain overfitting by evaluating MT3 on single-instrument monophonic datasets such as ComMU and NSynth. The findings, along with the source code, are shared to facilitate future work aimed at refining token-based multi-instrument AMT models.
Related papers
- YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation [15.9795868183084]
Multi-instrument music transcription aims to convert polyphonic music recordings into musical scores assigned to each instrument.
This paper introduces YourMT3+, a suite of models for enhanced multi-instrument music transcription.
Our experiments demonstrate direct vocal transcription capabilities, eliminating the need for voice separation pre-processors.
arXiv Detail & Related papers (2024-07-05T19:18:33Z) - Exploring a Test Data-Driven Method for Selecting and Constraining
Metamorphic Relations [46.889513596156185]
This paper presents a preliminary evaluation of MetaTrimmer, a method for selecting and constraining Metamorphic Relations based on test data.
The novelty of MetaTrimmer is its avoidance of complex prediction models that require labeled datasets regarding the applicability of MRs.
In a preliminary evaluation, MetaTrimmer shows the potential to overcome existing limitations and enhance MR effectiveness.
arXiv Detail & Related papers (2023-07-28T12:27:34Z) - Jointist: Simultaneous Improvement of Multi-instrument Transcription and
Music Source Separation via Joint Training [18.391476887027583]
Jointist is an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip.
Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results.
arXiv Detail & Related papers (2023-02-01T07:35:02Z) - The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in
Transformers [59.87030906486969]
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse.
We show that sparsity is a prevalent phenomenon that occurs for both natural language processing and vision tasks.
We discuss how sparsity immediately implies a way to significantly reduce the FLOP count and improve efficiency for Transformers.
arXiv Detail & Related papers (2022-10-12T15:25:19Z) - Jointist: Joint Learning for Multi-instrument Transcription and Its
Applications [15.921536323391226]
Jointist is an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip.
Jointist consists of the instrument recognition module that conditions the other modules: the transcription module that outputs instrument-specific piano rolls, and the source separation module that utilizes instrument information and transcription results.
arXiv Detail & Related papers (2022-06-22T02:03:01Z) - Sparse Conditional Hidden Markov Model for Weakly Supervised Named
Entity Recognition [68.68300358332156]
We propose the sparse conditional hidden Markov model (Sparse-CHMM) to evaluate noisy labeling functions.
Sparse-CHMM is optimized through unsupervised learning with a three-stage training pipeline.
It achieves a 3.01 average F1 score improvement on five comprehensive datasets.
arXiv Detail & Related papers (2022-05-27T20:47:30Z) - A Lightweight Instrument-Agnostic Model for Polyphonic Note
Transcription and Multipitch Estimation [6.131772929312604]
We propose a lightweight neural network for musical instrument transcription.
Our model is trained to jointly predict frame-wise onsets, multipitch and note activations.
benchmark results show our system's note estimation to be substantially better than a comparable baseline.
arXiv Detail & Related papers (2022-03-18T12:07:36Z) - End-to-End Object Detection with Fully Convolutional Network [71.56728221604158]
We introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection.
A simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region.
Our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets.
arXiv Detail & Related papers (2020-12-07T09:14:55Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Transfer Learning for Motor Imagery Based Brain-Computer Interfaces: A
Complete Pipeline [54.73337667795997]
Transfer learning (TL) has been widely used in motor imagery (MI) based brain-computer interfaces (BCIs) to reduce the calibration effort for a new subject.
This paper proposes that TL could be considered in all three components (spatial filtering, feature engineering, and classification) of MI-based BCIs.
arXiv Detail & Related papers (2020-07-03T23:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.