Related papers: SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias

SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias

URL: http://arxiv.org/abs/2209.06054v1
Date: Tue, 13 Sep 2022 15:05:27 GMT
Title: SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias
Authors: Zihao Wang, Kejun Zhang, Yuxing Wang, Chen Zhang, Qihao Liang, Pengfei Yu, Yongsheng Feng, Wenbo Liu, Yikai Wang, Yuntai Bao, Yiheng Yang
Abstract summary: SongDriver is a real-time music accompaniment generation system without logical latency or exposure bias. We train SongDriver on some open-source datasets and an original aiSong dataset built from Chinese-style modern pop music scores. The results show that SongDriver outperforms existing SOTA (state-of-the-art) models on both objective and subjective metrics.
Score: 15.7153621508319
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-time music accompaniment generation has a wide range of applications in the music industry, such as music education and live performances. However, automatic real-time music accompaniment generation is still understudied and often faces a trade-off between logical latency and exposure bias. In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias. Specifically, SongDriver divides one accompaniment generation task into two phases: 1) The arrangement phase, where a Transformer model first arranges chords for input melodies in real-time, and caches the chords for the next phase instead of playing them out. 2) The prediction phase, where a CRF model generates playable multi-track accompaniments for the coming melodies based on previously cached chords. With this two-phase strategy, SongDriver directly generates the accompaniment for the upcoming melody, achieving zero logical latency. Furthermore, when predicting chords for a timestep, SongDriver refers to the cached chords from the first phase rather than its previous predictions, which avoids the exposure bias problem. Since the input length is often constrained under real-time conditions, another potential problem is the loss of long-term sequential information. To make up for this disadvantage, we extract four musical features from a long-term music piece before the current time step as global information. In the experiment, we train SongDriver on some open-source datasets and an original \`aiSong Dataset built from Chinese-style modern pop music scores. The results show that SongDriver outperforms existing SOTA (state-of-the-art) models on both objective and subjective metrics, meanwhile significantly reducing the physical latency.

Related papers

Beat this! Accurate beat tracking without DBN postprocessing [4.440100868992127]
We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets, including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations. For high accuracy, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time.
arXiv Detail & Related papers (2024-07-31T14:59:17Z)
BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features [19.284531698181116]
BandControlNet is designed to tackle the multiple music sequences and generate high-quality music samples conditioned to the giventemporal control features. The proposed BandControlNet outperforms other conditional music generation models on most objective metrics in terms of fidelity and inference speed. The subjective evaluations show trained on short datasets can generate music with comparable quality to state-of-the-art models, while outperforming significantly using BandControlNet.
arXiv Detail & Related papers (2024-07-15T06:33:25Z)
MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens. We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z)
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music. We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z)
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation. Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures. With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z)
Musika! Fast Infinite Waveform Music Generation [0.0]
We introduce Musika, a music generation system that can be trained on hundreds of hours of music using a single consumer GPU. We achieve this by first learning a compact invertible representation of spectrogram magnitudes and phases with adversarial autoencoders. A latent coordinate system enables generating arbitrarily long sequences of excerpts in parallel, while a global context vector allows the music to remain stylistically coherent through time.
arXiv Detail & Related papers (2022-08-18T08:31:15Z)
Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation. ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z)
Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length. Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z)
Differential Music: Automated Music Generation Using LSTM Networks with Representation Based on Melodic and Harmonic Intervals [0.0]
This paper presents a generative AI model for automated music composition with LSTM networks. It takes a novel approach at encoding musical information which is based on movement in music rather than absolute pitch. Experimental results show promise as they sound musical and tonal.
arXiv Detail & Related papers (2021-08-23T23:51:08Z)
Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN) We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)
PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence. MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling. We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.