SongDriver: Real-time Music Accompaniment Generation without Logical
Latency nor Exposure Bias
- URL: http://arxiv.org/abs/2209.06054v1
- Date: Tue, 13 Sep 2022 15:05:27 GMT
- Title: SongDriver: Real-time Music Accompaniment Generation without Logical
Latency nor Exposure Bias
- Authors: Zihao Wang, Kejun Zhang, Yuxing Wang, Chen Zhang, Qihao Liang, Pengfei
Yu, Yongsheng Feng, Wenbo Liu, Yikai Wang, Yuntai Bao, Yiheng Yang
- Abstract summary: SongDriver is a real-time music accompaniment generation system without logical latency or exposure bias.
We train SongDriver on some open-source datasets and an original aiSong dataset built from Chinese-style modern pop music scores.
The results show that SongDriver outperforms existing SOTA (state-of-the-art) models on both objective and subjective metrics.
- Score: 15.7153621508319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time music accompaniment generation has a wide range of applications in
the music industry, such as music education and live performances. However,
automatic real-time music accompaniment generation is still understudied and
often faces a trade-off between logical latency and exposure bias. In this
paper, we propose SongDriver, a real-time music accompaniment generation system
without logical latency nor exposure bias. Specifically, SongDriver divides one
accompaniment generation task into two phases: 1) The arrangement phase, where
a Transformer model first arranges chords for input melodies in real-time, and
caches the chords for the next phase instead of playing them out. 2) The
prediction phase, where a CRF model generates playable multi-track
accompaniments for the coming melodies based on previously cached chords. With
this two-phase strategy, SongDriver directly generates the accompaniment for
the upcoming melody, achieving zero logical latency. Furthermore, when
predicting chords for a timestep, SongDriver refers to the cached chords from
the first phase rather than its previous predictions, which avoids the exposure
bias problem. Since the input length is often constrained under real-time
conditions, another potential problem is the loss of long-term sequential
information. To make up for this disadvantage, we extract four musical features
from a long-term music piece before the current time step as global
information. In the experiment, we train SongDriver on some open-source
datasets and an original \`aiSong Dataset built from Chinese-style modern pop
music scores. The results show that SongDriver outperforms existing SOTA
(state-of-the-art) models on both objective and subjective metrics, meanwhile
significantly reducing the physical latency.
Related papers
- Beat this! Accurate beat tracking without DBN postprocessing [4.440100868992127]
We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy.
We achieve generality by training on multiple datasets, including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations.
For high accuracy, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time.
arXiv Detail & Related papers (2024-07-31T14:59:17Z) - BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features [19.284531698181116]
BandControlNet is designed to tackle the multiple music sequences and generate high-quality music samples conditioned to the giventemporal control features.
The proposed BandControlNet outperforms other conditional music generation models on most objective metrics in terms of fidelity and inference speed.
The subjective evaluations show trained on short datasets can generate music with comparable quality to state-of-the-art models, while outperforming significantly using BandControlNet.
arXiv Detail & Related papers (2024-07-15T06:33:25Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - Musika! Fast Infinite Waveform Music Generation [0.0]
We introduce Musika, a music generation system that can be trained on hundreds of hours of music using a single consumer GPU.
We achieve this by first learning a compact invertible representation of spectrogram magnitudes and phases with adversarial autoencoders.
A latent coordinate system enables generating arbitrarily long sequences of excerpts in parallel, while a global context vector allows the music to remain stylistically coherent through time.
arXiv Detail & Related papers (2022-08-18T08:31:15Z) - Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation.
ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z) - Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length.
Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z) - Differential Music: Automated Music Generation Using LSTM Networks with
Representation Based on Melodic and Harmonic Intervals [0.0]
This paper presents a generative AI model for automated music composition with LSTM networks.
It takes a novel approach at encoding musical information which is based on movement in music rather than absolute pitch.
Experimental results show promise as they sound musical and tonal.
arXiv Detail & Related papers (2021-08-23T23:51:08Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.