FretNet: Continuous-Valued Pitch Contour Streaming for Polyphonic Guitar
Tablature Transcription
- URL: http://arxiv.org/abs/2212.03023v1
- Date: Tue, 6 Dec 2022 14:51:27 GMT
- Title: FretNet: Continuous-Valued Pitch Contour Streaming for Polyphonic Guitar
Tablature Transcription
- Authors: Frank Cwitkowitz, Toni Hirvonen, Anssi Klapuri
- Abstract summary: In certain applications, such as Guitar Tablature Transcription (GTT), it is more meaningful to estimate continuous-valued pitch contours.
We present a GTT formulation that estimates continuous-valued pitch contours, grouping them according to their string and fret of origin.
We demonstrate that for this task, the proposed method significantly improves the resolution of MPE and simultaneously yields tablature estimation results competitive with baseline models.
- Score: 0.34376560669160383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, the task of Automatic Music Transcription (AMT), whereby
various attributes of music notes are estimated from audio, has received
increasing attention. At the same time, the related task of Multi-Pitch
Estimation (MPE) remains a challenging but necessary component of almost all
AMT approaches, even if only implicitly. In the context of AMT, pitch
information is typically quantized to the nominal pitches of the Western music
scale. Even in more general contexts, MPE systems typically produce pitch
predictions with some degree of quantization. In certain applications of AMT,
such as Guitar Tablature Transcription (GTT), it is more meaningful to estimate
continuous-valued pitch contours. Guitar tablature has the capacity to
represent various playing techniques, some of which involve pitch modulation.
Contemporary approaches to AMT do not adequately address pitch modulation, and
offer only less quantization at the expense of more model complexity. In this
paper, we present a GTT formulation that estimates continuous-valued pitch
contours, grouping them according to their string and fret of origin. We
demonstrate that for this task, the proposed method significantly improves the
resolution of MPE and simultaneously yields tablature estimation results
competitive with baseline models.
Related papers
- YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation [15.9795868183084]
Multi-instrument music transcription aims to convert polyphonic music recordings into musical scores assigned to each instrument.
This paper introduces YourMT3+, a suite of models for enhanced multi-instrument music transcription.
Our experiments demonstrate direct vocal transcription capabilities, eliminating the need for voice separation pre-processors.
arXiv Detail & Related papers (2024-07-05T19:18:33Z) - TMT: Tri-Modal Translation between Speech, Image, and Text by Processing
Different Modalities as Different Languages [96.8603701943286]
Tri-Modal Translation (TMT) model translates between arbitrary modalities spanning speech, image, and text.
We tokenize speech and image data into discrete tokens, which provide a unified interface across modalities.
TMT outperforms single model counterparts consistently.
arXiv Detail & Related papers (2024-02-25T07:46:57Z) - Multitrack Music Transcription with a Time-Frequency Perceiver [6.617487928813374]
Multitrack music transcription aims to transcribe a music audio input into the musical notes of multiple instruments simultaneously.
We propose a novel deep neural network architecture, Perceiver TF, to model the time-frequency representation of audio input for multitrack transcription.
arXiv Detail & Related papers (2023-06-19T08:58:26Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - STT: Soft Template Tuning for Few-Shot Adaptation [72.46535261444151]
We propose a new prompt-tuning framework, called Soft Template Tuning (STT)
STT combines manual and auto prompts, and treats downstream classification tasks as a masked language modeling task.
It can even outperform the time- and resource-consuming fine-tuning method on sentiment classification tasks.
arXiv Detail & Related papers (2022-07-18T07:07:22Z) - Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length.
Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z) - Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances.
We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z) - MT3: Multi-Task Multitrack Music Transcription [7.5947187537718905]
We show that a general-purpose Transformer model can perform multi-task Automatic Music Transcription (AMT)
We show this unified training framework achieves high-quality transcription results across a range of datasets.
arXiv Detail & Related papers (2021-11-04T17:19:39Z) - A framework to compare music generative models using automatic
evaluation metrics extended to rhythm [69.2737664640826]
This paper takes the framework proposed in a previous research that did not consider rhythm to make a series of design decisions, then, rhythm support is added to evaluate the performance of two RNN memory cells in the creation of monophonic music.
The model considers the handling of music transposition and the framework evaluates the quality of the generated pieces using automatic quantitative metrics based on geometry which have rhythm support added as well.
arXiv Detail & Related papers (2021-01-19T15:04:46Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.