Transfer of knowledge among instruments in automatic music transcription
- URL: http://arxiv.org/abs/2305.00426v1
- Date: Sun, 30 Apr 2023 08:37:41 GMT
- Title: Transfer of knowledge among instruments in automatic music transcription
- Authors: Micha{\l} Le\'s and Micha{\l} Wo\'zniak
- Abstract summary: This work shows how to employ easily generated synthesized audio data produced by software synthesizers to train a universal model.
It is a good base for further transfer learning to quickly adapt transcription model for other instruments.
- Score: 2.0305676256390934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic music transcription (AMT) is one of the most challenging tasks in
the music information retrieval domain. It is the process of converting an
audio recording of music into a symbolic representation containing information
about the notes, chords, and rhythm. Current research in this domain focuses on
developing new models based on transformer architecture or using methods to
perform semi-supervised training, which gives outstanding results, but the
computational cost of training such models is enormous.
This work shows how to employ easily generated synthesized audio data
produced by software synthesizers to train a universal model. It is a good base
for further transfer learning to quickly adapt transcription model for other
instruments. Achieved results prove that using synthesized data for training
may be a good base for pretraining general-purpose models, where the task of
transcription is not focused on one instrument.
Related papers
- Audio-to-Score Conversion Model Based on Whisper methodology [0.0]
This thesis innovatively introduces the "Orpheus' Score", a custom notation system that converts music information into tokens.
Experiments show that compared to traditional algorithms, the model has significantly improved accuracy and performance.
arXiv Detail & Related papers (2024-10-22T17:31:37Z) - YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation [15.9795868183084]
Multi-instrument music transcription aims to convert polyphonic music recordings into musical scores assigned to each instrument.
This paper introduces YourMT3+, a suite of models for enhanced multi-instrument music transcription.
Our experiments demonstrate direct vocal transcription capabilities, eliminating the need for voice separation pre-processors.
arXiv Detail & Related papers (2024-07-05T19:18:33Z) - Annotation-free Automatic Music Transcription with Scalable Synthetic Data and Adversarial Domain Confusion [0.0]
We propose a transcription model that does not require any MIDI-audio paired data for pre-training and adversarial domain confusion.
In experiments, we evaluate methods under the real-world application scenario where training datasets do not include the MIDI annotation of audio.
Our proposed method achieved competitive performance relative to established baseline methods, despite not utilizing any real datasets of paired MIDI-audio.
arXiv Detail & Related papers (2023-12-16T10:07:18Z) - Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music
Transcription [19.228155694144995]
Timbre-Trap is a novel framework which unifies music transcription and audio reconstruction.
We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients.
We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods.
arXiv Detail & Related papers (2023-09-27T15:19:05Z) - Multitrack Music Transcription with a Time-Frequency Perceiver [6.617487928813374]
Multitrack music transcription aims to transcribe a music audio input into the musical notes of multiple instruments simultaneously.
We propose a novel deep neural network architecture, Perceiver TF, to model the time-frequency representation of audio input for multitrack transcription.
arXiv Detail & Related papers (2023-06-19T08:58:26Z) - Melody transcription via generative pre-training [86.08508957229348]
Key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles.
To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio.
We derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music.
arXiv Detail & Related papers (2022-12-04T18:09:23Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Music Instrument Classification Reprogrammed [79.68916470119743]
"Reprogramming" is a technique that utilizes pre-trained deep and complex neural networks originally targeting a different task by modifying and mapping both the input and output of the pre-trained model.
We demonstrate that reprogramming can effectively leverage the power of the representation learned for a different task and that the resulting reprogrammed system can perform on par or even outperform state-of-the-art systems at a fraction of training parameters.
arXiv Detail & Related papers (2022-11-15T18:26:01Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.