Composer's Assistant 2: Interactive Multi-Track MIDI Infilling with Fine-Grained User Control
- URL: http://arxiv.org/abs/2407.14700v1
- Date: Fri, 19 Jul 2024 23:28:09 GMT
- Title: Composer's Assistant 2: Interactive Multi-Track MIDI Infilling with Fine-Grained User Control
- Authors: Martin E. Malandro,
- Abstract summary: Composer's Assistant 2 is a system for interactive human-computer composition in the REAPER digital audio workstation.
New controls give users fine-grained control over the system's outputs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Composer's Assistant 2, a system for interactive human-computer composition in the REAPER digital audio workstation. Our work upgrades the Composer's Assistant system (which performs multi-track infilling of symbolic music at the track-measure level) with a wide range of new controls to give users fine-grained control over the system's outputs. Controls introduced in this work include two types of rhythmic conditioning controls, horizontal and vertical note onset density controls, several types of pitch controls, and a rhythmic interest control. We train a T5-like transformer model to implement these controls and to serve as the backbone of our system. With these controls, we achieve a dramatic improvement in objective metrics over the original system. We also study how well our model understands the meaning of our controls, and we conduct a listening study that does not find a significant difference between real music and music composed in a co-creative fashion with our system. We release our complete system, consisting of source code, pretrained models, and REAPER scripts.
Related papers
- BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features [19.284531698181116]
BandControlNet is designed to tackle the multiple music sequences and generate high-quality music samples conditioned to the giventemporal control features.
The proposed BandControlNet outperforms other conditional music generation models on most objective metrics in terms of fidelity and inference speed.
The subjective evaluations show trained on short datasets can generate music with comparable quality to state-of-the-art models, while outperforming significantly using BandControlNet.
arXiv Detail & Related papers (2024-07-15T06:33:25Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec [50.273832905535485]
We present ControlSpeech, a text-to-speech (TTS) system capable of fully mimicking the speaker's voice and enabling arbitrary control and adjustment of speaking style.
Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and adjustment capabilities or were unrelated to speaker-specific voice generation.
arXiv Detail & Related papers (2024-06-03T11:15:16Z) - Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt [50.25271407721519]
We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language.
We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation.
Experiments show that our model achieves favorable controlling ability and audio quality.
arXiv Detail & Related papers (2024-03-18T13:39:05Z) - Content-based Controls For Music Large Language Modeling [6.17674772485321]
Coco-Mulla is a content-based control method for music large language modeling.
It uses a parameter-efficient fine-tuning (PEFT) method tailored for Transformer-based audio models.
Our approach achieves high-quality music generation with low-resource semi-supervised learning.
arXiv Detail & Related papers (2023-10-26T05:24:38Z) - Anticipatory Music Transformer [60.15347393822849]
We introduce anticipation: a method for constructing a controllable generative model of a temporal point process.
We focus on infilling control tasks, whereby the controls are a subset of the events themselves.
We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset.
arXiv Detail & Related papers (2023-06-14T16:27:53Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Composer's Assistant: An Interactive Transformer for Multi-Track MIDI
Infilling [0.0]
Composer's Assistant is a system for interactive human-computer composition in the REAPER digital audio workstation.
We train a T5-like model to accomplish the task of multi-track MIDI infilling.
Composer's Assistant consists of this model together with scripts that enable interaction with the model in REAPER.
arXiv Detail & Related papers (2023-01-29T19:45:10Z) - MusIAC: An extensible generative framework for Music Infilling
Applications with multi-level Control [11.811562596386253]
Infilling refers to the task of generating musical sections given the surrounding multi-track music.
The proposed framework is for new control tokens as the added control tokens such as tonal tension per bar and track polyphony level.
We present the model in a Google Colab notebook to enable interactive generation.
arXiv Detail & Related papers (2022-02-11T10:02:21Z) - Learning a Contact-Adaptive Controller for Robust, Efficient Legged
Locomotion [95.1825179206694]
We present a framework that synthesizes robust controllers for a quadruped robot.
A high-level controller learns to choose from a set of primitives in response to changes in the environment.
A low-level controller that utilizes an established control method to robustly execute the primitives.
arXiv Detail & Related papers (2020-09-21T16:49:26Z) - MMM : Exploring Conditional Multi-Track Music Generation with the
Transformer [9.569049935824227]
We propose a generative system based on the Transformer architecture that is capable of generating multi-track music.
We create a time-ordered sequence of musical events for each track and several tracks into a single sequence.
This takes advantage of the Transformer's attention-mechanism, which can adeptly handle long-term dependencies.
arXiv Detail & Related papers (2020-08-13T02:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.