Melody transcription via generative pre-training
- URL: http://arxiv.org/abs/2212.01884v1
- Date: Sun, 4 Dec 2022 18:09:23 GMT
- Title: Melody transcription via generative pre-training
- Authors: Chris Donahue, John Thickstun, Percy Liang
- Abstract summary: Key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles.
To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio.
We derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music.
- Score: 86.08508957229348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the central role that melody plays in music perception, it remains an
open challenge in music information retrieval to reliably detect the notes of
the melody present in an arbitrary music recording. A key challenge in melody
transcription is building methods which can handle broad audio containing any
number of instrument ensembles and musical styles - existing strategies work
well for some melody instruments or styles but not all. To confront this
challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a
generative model of broad music audio, thereby improving performance on melody
transcription by $20$% relative to conventional spectrogram features. Another
obstacle in melody transcription is a lack of training data - we derive a new
dataset containing $50$ hours of melody transcriptions from crowdsourced
annotations of broad music. The combination of generative pre-training and a
new dataset for this task results in $77$% stronger performance on melody
transcription relative to the strongest available baseline. By pairing our new
melody transcription approach with solutions for beat detection, key
estimation, and chord recognition, we build Sheet Sage, a system capable of
transcribing human-readable lead sheets directly from music audio.
Audio examples can be found at https://chrisdonahue.com/sheetsage and code at
https://github.com/chrisdonahue/sheetsage .
Related papers
- Cluster and Separate: a GNN Approach to Voice and Staff Prediction for Score Engraving [5.572472212662453]
This paper approaches the problem of separating the notes from a quantized symbolic music piece (e.g., a MIDI file) into multiple voices and staves.
We propose an end-to-end system based on graph neural networks that notes that belong to the same chord and connect them with edges if they are part of a voice.
arXiv Detail & Related papers (2024-07-15T14:36:13Z) - Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music
Transcription [19.228155694144995]
Timbre-Trap is a novel framework which unifies music transcription and audio reconstruction.
We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients.
We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods.
arXiv Detail & Related papers (2023-09-27T15:19:05Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Unsupervised Melody-Guided Lyrics Generation [84.22469652275714]
We propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data.
We leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process.
arXiv Detail & Related papers (2023-05-12T20:57:20Z) - Transfer of knowledge among instruments in automatic music transcription [2.0305676256390934]
This work shows how to employ easily generated synthesized audio data produced by software synthesizers to train a universal model.
It is a good base for further transfer learning to quickly adapt transcription model for other instruments.
arXiv Detail & Related papers (2023-04-30T08:37:41Z) - Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation.
ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z) - Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances.
We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z) - Multi-Channel Automatic Music Transcription Using Tensor Algebra [0.0]
This report aims at developing some of the existing techniques towards Music Transcription.
It will also introduce the concept of multi-channel automatic music transcription.
arXiv Detail & Related papers (2021-07-23T14:07:40Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.