Polyffusion: A Diffusion Model for Polyphonic Score Generation with
Internal and External Controls
- URL: http://arxiv.org/abs/2307.10304v1
- Date: Wed, 19 Jul 2023 06:36:31 GMT
- Title: Polyffusion: A Diffusion Model for Polyphonic Score Generation with
Internal and External Controls
- Authors: Lejun Min, Junyan Jiang, Gus Xia, Jingwei Zhao
- Abstract summary: Polyffusion is a diffusion model that generates polyphonic music scores by regarding music as image-like piano roll representations.
We show that by using internal and external controls, Polyffusion unifies a wide range of music creation tasks.
- Score: 5.597394612661976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Polyffusion, a diffusion model that generates polyphonic music
scores by regarding music as image-like piano roll representations. The model
is capable of controllable music generation with two paradigms: internal
control and external control. Internal control refers to the process in which
users pre-define a part of the music and then let the model infill the rest,
similar to the task of masked music generation (or music inpainting). External
control conditions the model with external yet related information, such as
chord, texture, or other features, via the cross-attention mechanism. We show
that by using internal and external controls, Polyffusion unifies a wide range
of music creation tasks, including melody generation given accompaniment,
accompaniment generation given melody, arbitrary music segment inpainting, and
music arrangement given chords or textures. Experimental results show that our
model significantly outperforms existing Transformer and sampling-based
baselines, and using pre-trained disentangled representations as external
conditions yields more effective controls.
Related papers
- MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching.
We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z) - Combining audio control and style transfer using latent diffusion [1.705371629600151]
In this paper, we aim to unify explicit control and style transfer within a single model.
Our model can generate audio matching a timbre target, while specifying structure either with explicit controls or through another audio example.
We show that our method can generate cover versions of complete musical pieces by transferring rhythmic and melodic content to the style of a target audio in a different genre.
arXiv Detail & Related papers (2024-07-31T23:27:27Z) - DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - CoCoFormer: A controllable feature-rich polyphonic music generation
method [2.501600004190393]
This paper proposes Condition Choir Transformer (CoCoFormer) which controls the output of the model by controlling the chord and rhythm inputs at a fine-grained level.
In this paper, the experiments proves that CoCoFormer has reached the current better level than current models.
arXiv Detail & Related papers (2023-10-15T14:04:48Z) - Performance Conditioning for Diffusion-Based Multi-Instrument Music
Synthesis [15.670399197114012]
We propose enhancing control of multi-instrument synthesis by conditioning a generative model on a specific performance and recording environment.
Performance conditioning is a tool indicating the generative model to synthesize music with style and timbre of specific instruments taken from specific performances.
Our prototype is evaluated using uncurated performances with diverse instrumentation and state-of-the-art FAD realism scores.
arXiv Detail & Related papers (2023-09-21T17:44:57Z) - Anticipatory Music Transformer [60.15347393822849]
We introduce anticipation: a method for constructing a controllable generative model of a temporal point process.
We focus on infilling control tasks, whereby the controls are a subset of the events themselves.
We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset.
arXiv Detail & Related papers (2023-06-14T16:27:53Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models [67.66825818489406]
This paper introduces a text-to-waveform music generation model, underpinned by the utilization of diffusion models.
Our methodology hinges on the innovative incorporation of free-form textual prompts as conditional factors to guide the waveform generation process.
We demonstrate that our generated music in the waveform domain outperforms previous works by a large margin in terms of diversity, quality, and text-music relevance.
arXiv Detail & Related papers (2023-02-09T06:27:09Z) - MusIAC: An extensible generative framework for Music Infilling
Applications with multi-level Control [11.811562596386253]
Infilling refers to the task of generating musical sections given the surrounding multi-track music.
The proposed framework is for new control tokens as the added control tokens such as tonal tension per bar and track polyphony level.
We present the model in a Google Colab notebook to enable interactive generation.
arXiv Detail & Related papers (2022-02-11T10:02:21Z) - RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement
Learning [69.20460466735852]
This paper presents a deep reinforcement learning algorithm for online accompaniment generation.
The proposed algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part.
arXiv Detail & Related papers (2020-02-08T03:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.