Dual-track Music Generation using Deep Learning
- URL: http://arxiv.org/abs/2005.04353v1
- Date: Sat, 9 May 2020 02:34:39 GMT
- Title: Dual-track Music Generation using Deep Learning
- Authors: Sudi Lyu, Anxiang Zhang, Rong Song
- Abstract summary: We propose a novel dual-track architecture for generating classical piano music, which is able to model the inter-dependency of left-hand and right-hand piano music.
Under two evaluation methods, we compared our models with the MuseGAN project and true music.
- Score: 1.0312968200748118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Music generation is always interesting in a sense that there is no formalized
recipe. In this work, we propose a novel dual-track architecture for generating
classical piano music, which is able to model the inter-dependency of left-hand
and right-hand piano music. Particularly, we experimented with a lot of
different models of neural network as well as different representations of
music, and the results show that our proposed model outperforms all other
tested methods. Besides, we deployed some special policies for model training
and generation, which contributed to the model performance remarkably. Finally,
under two evaluation methods, we compared our models with the MuseGAN project
and true music.
Related papers
- MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching.
We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z) - Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models [9.311353871322325]
Mozart's Touch is composed of three main components: Multi-modal Captioning Module, Large Language Model (LLM) Understanding & Bridging Module, and Music Generation Module.
Unlike traditional approaches, Mozart's Touch requires no training or fine-tuning pre-trained models, offering efficiency and transparency through clear, interpretable prompts.
arXiv Detail & Related papers (2024-05-05T03:15:52Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - A Survey of Music Generation in the Context of Interaction [3.6522809408725223]
Machine learning has been successfully used to compose and generate music, both melodies and polyphonic pieces.
Most of these models are not suitable for human-machine co-creation through live interaction.
arXiv Detail & Related papers (2024-02-23T12:41:44Z) - StemGen: A music generation model that listens [9.489938613869864]
We present an alternative paradigm for producing music generation models that can listen and respond to musical context.
We describe how such a model can be constructed using a non-autoregressive, transformer-based model architecture.
The resulting model reaches the audio quality of state-of-the-art text-conditioned models, as well as exhibiting strong musical coherence with its context.
arXiv Detail & Related papers (2023-12-14T08:09:20Z) - From West to East: Who can understand the music of the others better? [91.78564268397139]
We leverage transfer learning methods to derive insights about similarities between different music cultures.
We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music.
Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset.
arXiv Detail & Related papers (2023-07-19T07:29:14Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - Actions Speak Louder than Listening: Evaluating Music Style Transfer
based on Editing Experience [4.986422167919228]
We propose an editing test to evaluate users' editing experience of music generation models in a systematic way.
Results on two target styles indicate that the improvement over the baseline model can be reflected by the editing test quantitatively.
arXiv Detail & Related papers (2021-10-25T12:20:30Z) - Learning to Generate Music With Sentiment [1.8275108630751844]
This paper presents a generative Deep Learning model that can be directed to compose music with a given sentiment.
Besides music generation, the same model can be used for sentiment analysis of symbolic music.
arXiv Detail & Related papers (2021-03-09T03:16:52Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.