POP909: A Pop-song Dataset for Music Arrangement Generation
- URL: http://arxiv.org/abs/2008.07142v1
- Date: Mon, 17 Aug 2020 08:08:14 GMT
- Title: POP909: A Pop-song Dataset for Music Arrangement Generation
- Authors: Ziyu Wang, Ke Chen, Junyan Jiang, Yiyi Zhang, Maoran Xu, Shuqi Dai,
Xianbin Gu, Gus Xia
- Abstract summary: We propose POP909, a dataset which contains multiple versions of the piano arrangements of 909 popular songs created by professional musicians.
The main body of the dataset contains the vocal melody, the lead instrument melody, and the piano accompaniment for each song in MIDI format, which are aligned to the original audio files.
We provide the annotations of tempo, beat, key, and chords, where the tempo curves are hand-labeled and others are done by MIR algorithms.
- Score: 10.0454303747519
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Music arrangement generation is a subtask of automatic music generation,
which involves reconstructing and re-conceptualizing a piece with new
compositional techniques. Such a generation process inevitably requires
reference from the original melody, chord progression, or other structural
information. Despite some promising models for arrangement, they lack more
refined data to achieve better evaluations and more practical results. In this
paper, we propose POP909, a dataset which contains multiple versions of the
piano arrangements of 909 popular songs created by professional musicians. The
main body of the dataset contains the vocal melody, the lead instrument melody,
and the piano accompaniment for each song in MIDI format, which are aligned to
the original audio files. Furthermore, we provide the annotations of tempo,
beat, key, and chords, where the tempo curves are hand-labeled and others are
done by MIR algorithms. Finally, we conduct several baseline experiments with
this dataset using standard deep music generation algorithms.
Related papers
- PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text [8.382511298208003]
PIAST (PIano dataset with Audio, Symbolic, and Text) is a piano music dataset.
We collected 9,673 tracks from YouTube and added human annotations for 2,023 tracks by music experts.
Both include audio, text, tag annotations, and transcribed MIDI utilizing state-of-the-art piano transcription and beat tracking models.
arXiv Detail & Related papers (2024-11-04T19:34:13Z) - MidiCaps: A large-scale MIDI dataset with text captions [6.806050368211496]
This work aims to enable research that combines LLMs with symbolic music by presenting, the first openly available large-scale MIDI dataset with text captions.
Inspired by recent advancements in captioning techniques, we present a curated dataset of over 168k MIDI files with textual descriptions.
arXiv Detail & Related papers (2024-06-04T12:21:55Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - InstructME: An Instruction Guided Music Edit And Remix Framework with
Latent Diffusion Models [42.2977676825086]
In this paper, we develop InstructME, an Instruction guided Music Editing and remixing framework based on latent diffusion models.
Our framework fortifies the U-Net with multi-scale aggregation in order to maintain consistency before and after editing.
Our proposed method significantly surpasses preceding systems in music quality, text relevance and harmony.
arXiv Detail & Related papers (2023-08-28T07:11:42Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - Melody transcription via generative pre-training [86.08508957229348]
Key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles.
To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio.
We derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music.
arXiv Detail & Related papers (2022-12-04T18:09:23Z) - ComMU: Dataset for Combinatorial Music Generation [20.762884001498627]
Combinatorial music generation creates short samples of music with rich musical metadata, and combines them to produce a complete music.
ComMU is the first symbolic music dataset consisting of short music samples and their corresponding 12 musical metadata.
Our results show that we can generate diverse high-quality music only with metadata, and that our unique metadata such as track-role and extended chord quality improves the capacity of the automatic composition.
arXiv Detail & Related papers (2022-11-17T07:25:09Z) - Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation.
ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z) - Controllable deep melody generation via hierarchical music structure
representation [14.891975420982511]
MusicFrameworks is a hierarchical music structure representation and a multi-step generative process to create a full-length melody.
To generate melody in each phrase, we generate rhythm and basic melody using two separate transformer-based networks.
To customize or add variety, one can alter chords, basic melody, and rhythm structure in the music frameworks, letting our networks generate the melody accordingly.
arXiv Detail & Related papers (2021-09-02T01:31:14Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.