Related papers: Dual-track Music Generation using Deep Learning

Dual-track Music Generation using Deep Learning

URL: http://arxiv.org/abs/2005.04353v1
Date: Sat, 9 May 2020 02:34:39 GMT
Title: Dual-track Music Generation using Deep Learning
Authors: Sudi Lyu, Anxiang Zhang, Rong Song
Abstract summary: We propose a novel dual-track architecture for generating classical piano music, which is able to model the inter-dependency of left-hand and right-hand piano music. Under two evaluation methods, we compared our models with the MuseGAN project and true music.
Score: 1.0312968200748118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Music generation is always interesting in a sense that there is no formalized recipe. In this work, we propose a novel dual-track architecture for generating classical piano music, which is able to model the inter-dependency of left-hand and right-hand piano music. Particularly, we experimented with a lot of different models of neural network as well as different representations of music, and the results show that our proposed model outperforms all other tested methods. Besides, we deployed some special policies for model training and generation, which contributed to the model performance remarkably. Finally, under two evaluation methods, we compared our models with the MuseGAN project and true music.

Related papers

Deconstructing Jazz Piano Style Using Machine Learning [0.9933900714070033]
We focus on musical style, which benefits from a rich theoretical and mathematical analysis tradition. We train a variety of supervised-learning models to identify 20 iconic jazz musicians across a dataset of 84 hours of recordings. Our models include a novel multi-input architecture that enables four musical domains (melody, harmony, rhythm, and dynamics) to be analysed separately.
arXiv Detail & Related papers (2025-04-07T12:37:39Z)
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching. We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
A Survey of Music Generation in the Context of Interaction [3.6522809408725223]
Machine learning has been successfully used to compose and generate music, both melodies and polyphonic pieces. Most of these models are not suitable for human-machine co-creation through live interaction.
arXiv Detail & Related papers (2024-02-23T12:41:44Z)
StemGen: A music generation model that listens [9.489938613869864]
We present an alternative paradigm for producing music generation models that can listen and respond to musical context. We describe how such a model can be constructed using a non-autoregressive, transformer-based model architecture. The resulting model reaches the audio quality of state-of-the-art text-conditioned models, as well as exhibiting strong musical coherence with its context.
arXiv Detail & Related papers (2023-12-14T08:09:20Z)
From West to East: Who can understand the music of the others better? [91.78564268397139]
We leverage transfer learning methods to derive insights about similarities between different music cultures. We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music. Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset.
arXiv Detail & Related papers (2023-07-19T07:29:14Z)
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z)
Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience [4.986422167919228]
We propose an editing test to evaluate users' editing experience of music generation models in a systematic way. Results on two target styles indicate that the improvement over the baseline model can be reflected by the editing test quantitatively.
arXiv Detail & Related papers (2021-10-25T12:20:30Z)
Personalized Popular Music Generation Using Imitation and Structure [1.971709238332434]
We propose a statistical machine learning model that is able to capture and imitate the structure, melody, chord, and bass style from a given example seed song. An evaluation using 10 pop songs shows that our new representations and methods are able to create high-quality stylistic music.
arXiv Detail & Related papers (2021-05-10T23:43:00Z)
Learning to Generate Music With Sentiment [1.8275108630751844]
This paper presents a generative Deep Learning model that can be directed to compose music with a given sentiment. Besides music generation, the same model can be used for sentiment analysis of symbolic music.
arXiv Detail & Related papers (2021-03-09T03:16:52Z)
PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence. MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling. We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z)
Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music. We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.