Emotion-Conditioned Melody Harmonization with Hierarchical Variational
Autoencoder
- URL: http://arxiv.org/abs/2306.03718v4
- Date: Thu, 20 Jul 2023 03:06:50 GMT
- Title: Emotion-Conditioned Melody Harmonization with Hierarchical Variational
Autoencoder
- Authors: Shulei Ji and Xinyu Yang
- Abstract summary: We propose a novel LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the influence of emotional conditions on melody harmonization.
LHVAE incorporates latent variables and emotional conditions at different levels to model the global and local music properties.
Objective experimental results show that our proposed model outperforms other LSTM-based models.
- Score: 11.635877697635449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing melody harmonization models have made great progress in improving
the quality of generated harmonies, but most of them ignored the emotions
beneath the music. Meanwhile, the variability of harmonies generated by
previous methods is insufficient. To solve these problems, we propose a novel
LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the
influence of emotional conditions on melody harmonization, while improving the
quality of generated harmonies and capturing the abundant variability of chord
progressions. Specifically, LHVAE incorporates latent variables and emotional
conditions at different levels (piece- and bar-level) to model the global and
local music properties. Additionally, we introduce an attention-based melody
context vector at each step to better learn the correspondence between melodies
and harmonies. Objective experimental results show that our proposed model
outperforms other LSTM-based models. Through subjective evaluation, we conclude
that only altering the types of chords hardly changes the overall emotion of
the music. The qualitative analysis demonstrates the ability of our model to
generate variable harmonies.
Related papers
- Emotion-Driven Melody Harmonization via Melodic Variation and Functional Representation [16.790582113573453]
Emotion-driven melody aims to generate diverse harmonies for a single melody to convey desired emotions.
Previous research found it hard to alter the perceived emotional valence of lead sheets only by harmonizing the same melody with different chords.
In this paper, we propose a novel functional representation for symbolic music.
arXiv Detail & Related papers (2024-07-29T17:05:12Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Towards Improving Harmonic Sensitivity and Prediction Stability for
Singing Melody Extraction [36.45127093978295]
We propose an input feature modification and a training objective modification based on two assumptions.
To enhance the model's sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity representation using discrete z-transform.
We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network.
arXiv Detail & Related papers (2023-08-04T21:59:40Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - BacHMMachine: An Interpretable and Scalable Model for Algorithmic
Harmonization for Four-part Baroque Chorales [23.64897650817862]
BacHMMachine employs a "theory-driven" framework guided by music composition principles.
It provides a probabilistic framework for learning key modulations and chordal progressions from a given melodic line.
It results in vast decreases in computational burden and greater interpretability.
arXiv Detail & Related papers (2021-09-15T23:39:45Z) - DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score.
The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z) - A framework to compare music generative models using automatic
evaluation metrics extended to rhythm [69.2737664640826]
This paper takes the framework proposed in a previous research that did not consider rhythm to make a series of design decisions, then, rhythm support is added to evaluate the performance of two RNN memory cells in the creation of monophonic music.
The model considers the handling of music transposition and the framework evaluates the quality of the generated pieces using automatic quantitative metrics based on geometry which have rhythm support added as well.
arXiv Detail & Related papers (2021-01-19T15:04:46Z) - HpRNet : Incorporating Residual Noise Modeling for Violin in a
Variational Parametric Synthesizer [11.4219428942199]
We introduce a dataset of Carnatic Violin Recordings where bow noise is an integral part of the playing style of higher pitched notes.
We obtain insights about each of the harmonic and residual components of the signal, as well as their interdependence.
arXiv Detail & Related papers (2020-08-19T12:48:32Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.