LoopGen: Training-Free Loopable Music Generation
- URL: http://arxiv.org/abs/2504.04466v2
- Date: Tue, 08 Apr 2025 06:13:10 GMT
- Title: LoopGen: Training-Free Loopable Music Generation
- Authors: Davide Marincione, Giorgio Strano, Donato Crisostomi, Roberto Ribuoli, Emanuele RodolĂ ,
- Abstract summary: We develop a non-autoregressive model (MAGNeT) to generate tokens in a circular pattern, letting the model attend to the beginning of the audio when creating its ending.<n>We evaluate the consistency of loop transitions by computing token perplexity around the seam of the loop, observing a 55% improvement.
- Score: 12.663333784905578
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Loops--short audio segments designed for seamless repetition--are central to many music genres, particularly those rooted in dance and electronic styles. However, current generative music models struggle to produce truly loopable audio, as generating a short waveform alone does not guarantee a smooth transition from its endpoint back to its start, often resulting in audible discontinuities. Loops--short audio segments designed for seamless repetition--are central to many music genres, particularly those rooted in dance and electronic styles. However, current generative music models struggle to produce truly loopable audio, as generating a short waveform alone does not guarantee a smooth transition from its endpoint back to its start, often resulting in audible discontinuities. We address this gap by modifying a non-autoregressive model (MAGNeT) to generate tokens in a circular pattern, letting the model attend to the beginning of the audio when creating its ending. This inference-only approach results in generations that are aware of future context and loop naturally, without the need for any additional training or data. We evaluate the consistency of loop transitions by computing token perplexity around the seam of the loop, observing a 55% improvement. Blind listening tests further confirm significant perceptual gains over baseline methods, improving mean ratings by 70%. Taken together, these results highlight the effectiveness of inference-only approaches in improving generative models and underscore the advantages of non-autoregressive methods for context-aware music generation.
Related papers
- SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation [75.86473375730392]
SongGen is a fully open-source, single-stage auto-regressive transformer for controllable song generation.
It supports two output modes: mixed mode, which generates a mixture of vocals and accompaniment directly, and dual-track mode, which synthesizes them separately.
To foster community engagement and future research, we will release our model weights, training code, annotated data, and preprocessing pipeline.
arXiv Detail & Related papers (2025-02-18T18:52:21Z) - MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching.
We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - Controllable Music Production with Diffusion Models and Guidance
Gradients [3.187381965457262]
We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in 44.1kHz stereo audio.
The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic characteristics to existing audio clips.
arXiv Detail & Related papers (2023-11-01T16:01:01Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length.
Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z) - Catch-A-Waveform: Learning to Generate Audio from a Single Short Example [33.96833901121411]
We present a GAN-based generative model that can be trained on one short audio signal from any domain.
We show that in all cases, no more than 20 seconds of training audio commonly suffice for our model to achieve state-of-the-art results.
arXiv Detail & Related papers (2021-06-11T14:35:11Z) - LoopNet: Musical Loop Synthesis Conditioned On Intuitive Musical
Parameters [12.72202888016628]
LoopNet is a feed-forward generative model for creating loops conditioned on intuitive parameters.
We leverage Music Information Retrieval (MIR) models as well as a large collection of public loop samples in our study.
arXiv Detail & Related papers (2021-05-21T14:24:34Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - Unconditional Audio Generation with Generative Adversarial Networks and
Cycle Regularization [48.55126268721948]
We present a generative adversarial network (GAN)-based model for unconditional generation of the mel-spectrograms of singing voices.
We employ a hierarchical architecture in the generator to induce some structure in the temporal dimension.
We evaluate the performance of the new model not only for generating singing voices, but also for generating speech voices.
arXiv Detail & Related papers (2020-05-18T08:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.