Musika! Fast Infinite Waveform Music Generation
- URL: http://arxiv.org/abs/2208.08706v1
- Date: Thu, 18 Aug 2022 08:31:15 GMT
- Title: Musika! Fast Infinite Waveform Music Generation
- Authors: Marco Pasini, Jan Schl\"uter
- Abstract summary: We introduce Musika, a music generation system that can be trained on hundreds of hours of music using a single consumer GPU.
We achieve this by first learning a compact invertible representation of spectrogram magnitudes and phases with adversarial autoencoders.
A latent coordinate system enables generating arbitrarily long sequences of excerpts in parallel, while a global context vector allows the music to remain stylistically coherent through time.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fast and user-controllable music generation could enable novel ways of
composing or performing music. However, state-of-the-art music generation
systems require large amounts of data and computational resources for training,
and are slow at inference. This makes them impractical for real-time
interactive use. In this work, we introduce Musika, a music generation system
that can be trained on hundreds of hours of music using a single consumer GPU,
and that allows for much faster than real-time generation of music of arbitrary
length on a consumer CPU. We achieve this by first learning a compact
invertible representation of spectrogram magnitudes and phases with adversarial
autoencoders, then training a Generative Adversarial Network (GAN) on this
representation for a particular music domain. A latent coordinate system
enables generating arbitrarily long sequences of excerpts in parallel, while a
global context vector allows the music to remain stylistically coherent through
time. We perform quantitative evaluations to assess the quality of the
generated samples and showcase options for user control in piano and techno
music generation. We release the source code and pretrained autoencoder weights
at github.com/marcoppasini/musika, such that a GAN can be trained on a new
music domain with a single GPU in a matter of hours.
Related papers
- MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation [19.878013881045817]
MusiConGen is a temporally-conditioned Transformer-based text-to-music model.
It integrates automatically-extracted rhythm and chords as the condition signal.
We show that MusiConGen can generate realistic backing track music that aligns well with the specified conditions.
arXiv Detail & Related papers (2024-07-21T05:27:53Z) - Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls [6.176747724853209]
Large Language Models (LLMs) have shown promise in generating high-quality music, but their focus on autoregressive generation limits their utility in music editing tasks.
We propose a novel approach leveraging a parameter-efficient heterogeneous adapter combined with a masking training scheme.
Our method integrates frame-level content-based controls, facilitating track-conditioned music refinement and score-conditioned music arrangement.
arXiv Detail & Related papers (2024-02-14T19:00:01Z) - MAGMA: Music Aligned Generative Motion Autodecoder [15.825872274297735]
We introduce a 2-step approach for generating dance using a Vector Quantized-Variational Autoencoder (VQ-VAE)
We also evaluate the importance of music representations by comparing naive music feature extraction using Librosa to deep audio representations generated by state-of-the-art audio compression algorithms.
Our proposed approach achieve state-of-the-art results in music-to-motion generation benchmarks and enables the real-time generation of considerably longer motion sequences.
arXiv Detail & Related papers (2023-09-03T15:21:47Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Multi-Instrumentalist Net: Unsupervised Generation of Music from Body
Movements [20.627164135805852]
We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting.
We build a pipeline named 'Multi-instrumentalistNet' that learns a discrete latent representation of various instruments music from log-spectrogram.
We show that a Midi can further condition the latent space such that the pipeline will generate the exact content of the music being played by the instrument in the video.
arXiv Detail & Related papers (2020-12-07T06:54:10Z) - Lets Play Music: Audio-driven Performance Video Generation [58.77609661515749]
We propose a new task named Audio-driven Per-formance Video Generation (APVG)
APVG aims to synthesize the video of a person playing a certain instrument guided by a given music audio clip.
arXiv Detail & Related papers (2020-11-05T03:13:46Z) - Incorporating Music Knowledge in Continual Dataset Augmentation for
Music Generation [69.06413031969674]
Aug-Gen is a method of dataset augmentation for any music generation system trained on a resource-constrained domain.
We apply Aug-Gen to Transformer-based chorale generation in the style of J.S. Bach, and show that this allows for longer training and results in better generative output.
arXiv Detail & Related papers (2020-06-23T21:06:15Z) - RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement
Learning [69.20460466735852]
This paper presents a deep reinforcement learning algorithm for online accompaniment generation.
The proposed algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part.
arXiv Detail & Related papers (2020-02-08T03:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.