GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework
- URL: http://arxiv.org/abs/2305.10841v2
- Date: Fri, 29 Sep 2023 11:05:19 GMT
- Title: GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework
- Authors: Ang Lv and Xu Tan and Peiling Lu and Wei Ye and Shikun Zhang and Jiang
Bian and Rui Yan
- Abstract summary: Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
- Score: 58.64512825534638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Symbolic music generation aims to create musical notes, which can help users
compose music, such as generating target instrument tracks based on provided
source tracks. In practical scenarios where there's a predefined ensemble of
tracks and various composition needs, an efficient and effective generative
model that can generate any target tracks based on the other tracks becomes
crucial. However, previous efforts have fallen short in addressing this
necessity due to limitations in their music representations and models. In this
paper, we introduce a framework known as GETMusic, with ``GET'' standing for
``GEnerate music Tracks.'' This framework encompasses a novel music
representation ``GETScore'' and a diffusion model ``GETDiff.'' GETScore
represents musical notes as tokens and organizes tokens in a 2D structure, with
tracks stacked vertically and progressing horizontally over time. At a training
step, each track of a music piece is randomly selected as either the target or
source. The training involves two processes: In the forward process, target
tracks are corrupted by masking their tokens, while source tracks remain as the
ground truth; in the denoising process, GETDiff is trained to predict the
masked target tokens conditioning on the source tracks. Our proposed
representation, coupled with the non-autoregressive generative model, empowers
GETMusic to generate music with any arbitrary source-target track combinations.
Our experiments demonstrate that the versatile GETMusic outperforms prior works
proposed for certain specific composition tasks.
Related papers
- MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT [44.204383306879095]
We propose a novel symbolic music representation and Generative Adversarial Network (GAN) framework specially designed for symbolic multitrack music generation.
To build a robust multitrack music generator, we fine-tune a pre-trained MusicBERT model to serve as the discriminator, and incorporate relativistic standard loss.
arXiv Detail & Related papers (2024-09-02T03:18:56Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music
Generation [20.733264277770154]
JEN-1 Composer is a unified framework to efficiently model marginal, conditional, and joint distributions over multi-track music.
We introduce a curriculum training strategy aimed at incrementally instructing the model in the transition from single-track generation to the flexible generation of multi-track combinations.
We demonstrate state-of-the-art performance in controllable and high-fidelity multi-track music synthesis.
arXiv Detail & Related papers (2023-10-29T22:51:49Z) - MusicLDM: Enhancing Novelty in Text-to-Music Generation Using
Beat-Synchronous Mixup Strategies [32.482588500419006]
We build a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain.
We propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup.
In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music.
arXiv Detail & Related papers (2023-08-03T05:35:37Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Comparision Of Adversarial And Non-Adversarial LSTM Music Generative
Models [2.569647910019739]
This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data.
The evaluation indicates that adversarial training produces more aesthetically pleasing music.
arXiv Detail & Related papers (2022-11-01T20:23:49Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.