Semi-Supervised Convolutive NMF for Automatic Music Transcription
- URL: http://arxiv.org/abs/2202.04989v1
- Date: Thu, 10 Feb 2022 12:38:53 GMT
- Title: Semi-Supervised Convolutive NMF for Automatic Music Transcription
- Authors: Haoran Wu, Axel Marmoret, J\'er\'emy E. Cohen
- Abstract summary: We propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization.
We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods.
- Score: 6.583111368144214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic Music Transcription, which consists in transforming an audio
recording of a musical performance into symbolic format, remains a difficult
Music Information Retrieval task. In this work, we propose a semi-supervised
approach using low-rank matrix factorization techniques, in particular
Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting,
only a single recording of each individual notes is required.
We show on the MAPS dataset that the proposed semi-supervised CNMF method
performs better than state-of-the-art low-rank factorization techniques and a
little worse than supervised deep learning state-of-the-art methods, while
however suffering from generalization issues.
Related papers
- Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators [83.48423407316713]
We present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately.
Our model initiates the denoising process with a precise, non-ambiguous stage and gradually transitions to a phase enriched with detail.
Our method achieves a state-of-the-art FID score of 2.01 when integrated with the recent work SiT.
arXiv Detail & Related papers (2024-08-11T07:01:39Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - Machine Learning Techniques in Automatic Music Transcription: A Systematic Survey [2.4895506645605123]
This systematic review accentuates the pivotal role of Automatic Music Transcription (AMT) in music signal analysis.
Despite notable advancements, AMT systems have yet to match the accuracy of human experts.
By addressing the limitations of prior techniques and suggesting avenues for improvement, our objective is to steer future research towards fully automated AMT systems.
arXiv Detail & Related papers (2024-06-20T03:48:15Z) - End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music [12.779526750915707]
We present the first truly end-to-end approach for page-level Optical Music Recognition.
Our system processes an entire music score page and outputs a complete transcription in a music encoding format.
The results demonstrate that our system not only successfully transcribes full-page music scores but also outperforms the commercial tool in both zero-shot settings and after fine-tuning with the target domain.
arXiv Detail & Related papers (2024-05-20T15:21:48Z) - Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription [13.960714900433269]
Sheet Music Transformer is the first end-to-end OMR model designed to transcribe complex musical scores without relying solely on monophonic strategies.
Our model has been tested on two polyphonic music datasets and has proven capable of handling these intricate music structures effectively.
arXiv Detail & Related papers (2024-02-12T11:52:21Z) - DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - Masked Audio Generation using a Single Non-Autoregressive Transformer [90.11646612273965]
MAGNeT is a masked generative sequence modeling method that operates directly over several streams of audio tokens.
We demonstrate the efficiency of MAGNeT for the task of text-to-music and text-to-audio generation.
We shed light on the importance of each of the components comprising MAGNeT, together with pointing to the trade-offs between autoregressive and non-autoregressive modeling.
arXiv Detail & Related papers (2024-01-09T14:29:39Z) - Music Instrument Classification Reprogrammed [79.68916470119743]
"Reprogramming" is a technique that utilizes pre-trained deep and complex neural networks originally targeting a different task by modifying and mapping both the input and output of the pre-trained model.
We demonstrate that reprogramming can effectively leverage the power of the representation learned for a different task and that the resulting reprogrammed system can perform on par or even outperform state-of-the-art systems at a fraction of training parameters.
arXiv Detail & Related papers (2022-11-15T18:26:01Z) - Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data.
Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably.
We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z) - Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances.
We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z) - MT3: Multi-Task Multitrack Music Transcription [7.5947187537718905]
We show that a general-purpose Transformer model can perform multi-task Automatic Music Transcription (AMT)
We show this unified training framework achieves high-quality transcription results across a range of datasets.
arXiv Detail & Related papers (2021-11-04T17:19:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.