Related papers: Expressive MIDI-format Piano Performance Generation

Expressive MIDI-format Piano Performance Generation

URL: http://arxiv.org/abs/2408.00900v1
Date: Thu, 1 Aug 2024 20:36:37 GMT
Title: Expressive MIDI-format Piano Performance Generation
Authors: Jingwei Liu,
Abstract summary: This work presents a generative neural network that's able to generate expressive piano performance in MIDI format. The musical expressivity is reflected by vivid micro-timing, rich polyphonic texture, varied dynamics, and the sustain pedal effects.
Score: 4.549093083765949
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work presents a generative neural network that's able to generate expressive piano performance in MIDI format. The musical expressivity is reflected by vivid micro-timing, rich polyphonic texture, varied dynamics, and the sustain pedal effects. This model is innovative from many aspects of data processing to neural network design. We claim that this symbolic music generation model overcame the common critics of symbolic music and is able to generate expressive music flows as good as, if not better than generations with raw audio. One drawback is that, due to the limited time for submission, the model is not fine-tuned and sufficiently trained, thus the generation may sound incoherent and random at certain points. Despite that, this model shows its powerful generative ability to generate expressive piano pieces.

Related papers

MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling [32.78044321881271]
We propose MIDI-VALLE, a neural language model adapted from the VALLE framework for personalised text-to-speech synthesis.<n>VALLE encodes both MIDI and audio as discrete tokens, facilitating a more consistent and robust modelling of piano performances.<n> Evaluation results show that MIDI-VALLE significantly outperforms a state-of-the-art baseline.
arXiv Detail & Related papers (2025-07-11T12:28:20Z)
Scaling Self-Supervised Representation Learning for Symbolic Piano Performance [52.661197827466886]
We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions.<n>We use a comparatively smaller, high-quality subset to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings.
arXiv Detail & Related papers (2025-06-30T14:00:14Z)
Adaptive Accompaniment with ReaLchords [60.690020661819055]
We propose ReaLchords, an online generative model for improvising chord accompaniment to user melody.<n>We start with an online model pretrained by maximum likelihood, and use reinforcement learning to finetune the model for online use.
arXiv Detail & Related papers (2025-06-17T16:59:05Z)
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching. We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z)
PerTok: Expressive Encoding and Modeling of Symbolic Musical Ideas and Variations [0.3683202928838613]
Cadenza is a new multi-stage generative framework for predicting expressive variations of symbolic musical ideas. The proposed framework comprises of two sequential stages: 1) Composer and 2) Performer. Our framework is designed, researched and implemented with the objective of providing inspiration for musicians.
arXiv Detail & Related papers (2024-10-02T22:11:31Z)
Noise2Music: Text-conditioned Music Generation with Diffusion Models [73.74580231353684]
We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts. We find that the generated audio is not only able to faithfully reflect key elements of the text prompt such as genre, tempo, instruments, mood, and era. Pretrained large language models play a key role in this story -- they are used to generate paired text for the audio of the training set and to extract embeddings of the text prompts ingested by the diffusion models.
arXiv Detail & Related papers (2023-02-08T07:27:27Z)
Comparision Of Adversarial And Non-Adversarial LSTM Music Generative Models [2.569647910019739]
This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data. The evaluation indicates that adversarial training produces more aesthetically pleasing music.
arXiv Detail & Related papers (2022-11-01T20:23:49Z)
Mel Spectrogram Inversion with Stable Pitch [0.0]
Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a waveform. Recent vocoder models developed for speech achieve a high degree of realism. Compared to speech, the structure of the musical sound texture offers new challenges.
arXiv Detail & Related papers (2022-08-26T17:01:57Z)
Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation. ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z)
Using a Bi-directional LSTM Model with Attention Mechanism trained on MIDI Data for Generating Unique Music [0.25559196081940677]
This paper proposes a bi-directional LSTM model with attention mechanism capable of generating similar type of music based on MIDI data. The music generated by the model follows the theme/style of the music the model is trained on.
arXiv Detail & Related papers (2020-11-02T06:43:28Z)
Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN) We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)
Foley Music: Learning to Generate Music from Videos [115.41099127291216]
Foley Music is a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We present a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements.
arXiv Detail & Related papers (2020-07-21T17:59:06Z)
Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance [6.531546527140474]
controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE) We demonstrate how the model is able to apply fine-grained style morphing over the course of the audio.
arXiv Detail & Related papers (2020-06-16T12:54:41Z)
RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning [69.20460466735852]
This paper presents a deep reinforcement learning algorithm for online accompaniment generation. The proposed algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part.
arXiv Detail & Related papers (2020-02-08T03:53:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.