Towards Automatic Instrumentation by Learning to Separate Parts in
Symbolic Multitrack Music
- URL: http://arxiv.org/abs/2107.05916v1
- Date: Tue, 13 Jul 2021 08:34:44 GMT
- Title: Towards Automatic Instrumentation by Learning to Separate Parts in
Symbolic Multitrack Music
- Authors: Hao-Wen Dong, Chris Donahue, Taylor Berg-Kirkpatrick, Julian McAuley
- Abstract summary: We study the feasibility of automatic instrumentation -- dynamically assigning instruments to notes in solo music during performance.
In addition to the online, real-time-capable setting for performative use cases, automatic instrumentation can also find applications in assistive composing tools in an offline setting.
We frame the task of part separation as a sequential multi-class classification problem and adopt machine learning to map sequences of notes into sequences of part labels.
- Score: 33.679951600368405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern keyboards allow a musician to play multiple instruments at the same
time by assigning zones -- fixed pitch ranges of the keyboard -- to different
instruments. In this paper, we aim to further extend this idea and examine the
feasibility of automatic instrumentation -- dynamically assigning instruments
to notes in solo music during performance. In addition to the online,
real-time-capable setting for performative use cases, automatic instrumentation
can also find applications in assistive composing tools in an offline setting.
Due to the lack of paired data of original solo music and their full
arrangements, we approach automatic instrumentation by learning to separate
parts (e.g., voices, instruments and tracks) from their mixture in symbolic
multitrack music, assuming that the mixture is to be played on a keyboard. We
frame the task of part separation as a sequential multi-class classification
problem and adopt machine learning to map sequences of notes into sequences of
part labels. To examine the effectiveness of our proposed models, we conduct a
comprehensive empirical evaluation over four diverse datasets of different
genres and ensembles -- Bach chorales, string quartets, game music and pop
music. Our experiments show that the proposed models outperform various
baselines. We also demonstrate the potential for our proposed models to produce
alternative convincing instrumentations for an existing arrangement by
separating its mixture into parts. All source code and audio samples can be
found at https://salu133445.github.io/arranger/ .
Related papers
- Subtractive Training for Music Stem Insertion using Latent Diffusion Models [35.91945598575059]
We present Subtractive Training, a method for synthesizing individual musical instrument stems given other instruments as context.
Our results demonstrate Subtractive Training's efficacy in creating authentic drum stems that seamlessly blend with the existing tracks.
We extend this technique to MIDI formats, successfully generating compatible bass, drum, and guitar parts for incomplete arrangements.
arXiv Detail & Related papers (2024-06-27T16:59:14Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio [11.941510958668557]
We call this task as Musical Instrument Retrieval.
We propose a method for retrieving desired musical instruments using reference music mixture as a query.
The proposed model consists of the Single-Instrument and the Multi-Instrument, both based on convolutional neural networks.
arXiv Detail & Related papers (2022-11-15T07:32:39Z) - Jointist: Joint Learning for Multi-instrument Transcription and Its
Applications [15.921536323391226]
Jointist is an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip.
Jointist consists of the instrument recognition module that conditions the other modules: the transcription module that outputs instrument-specific piano rolls, and the source separation module that utilizes instrument information and transcription results.
arXiv Detail & Related papers (2022-06-22T02:03:01Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance [88.0355290619761]
This work focuses on the separation of unknown musical instruments.
We propose the Separation-with-Consistency (SeCo) framework, which can accomplish the separation on unknown categories.
Our framework exhibits strong adaptation ability on the novel musical categories and outperforms the baseline methods by a significant margin.
arXiv Detail & Related papers (2022-03-25T09:42:11Z) - Multi-Instrumentalist Net: Unsupervised Generation of Music from Body
Movements [20.627164135805852]
We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting.
We build a pipeline named 'Multi-instrumentalistNet' that learns a discrete latent representation of various instruments music from log-spectrogram.
We show that a Midi can further condition the latent space such that the pipeline will generate the exact content of the music being played by the instrument in the video.
arXiv Detail & Related papers (2020-12-07T06:54:10Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.