MusER: Musical Element-Based Regularization for Generating Symbolic
Music with Emotion
- URL: http://arxiv.org/abs/2312.10307v2
- Date: Tue, 2 Jan 2024 02:36:22 GMT
- Title: MusER: Musical Element-Based Regularization for Generating Symbolic
Music with Emotion
- Authors: Shulei Ji and Xinyu Yang
- Abstract summary: We present a novel approach employing musical element-based regularization in the latent space to disentangle distinct elements.
By visualizing latent space, we conclude that MusER yields a disentangled and interpretable latent space.
Experimental results demonstrate that MusER outperforms the state-of-the-art models for generating emotional music.
- Score: 16.658813060879293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating music with emotion is an important task in automatic music
generation, in which emotion is evoked through a variety of musical elements
(such as pitch and duration) that change over time and collaborate with each
other. However, prior research on deep learning-based emotional music
generation has rarely explored the contribution of different musical elements
to emotions, let alone the deliberate manipulation of these elements to alter
the emotion of music, which is not conducive to fine-grained element-level
control over emotions. To address this gap, we present a novel approach
employing musical element-based regularization in the latent space to
disentangle distinct elements, investigate their roles in distinguishing
emotions, and further manipulate elements to alter musical emotions.
Specifically, we propose a novel VQ-VAE-based model named MusER. MusER
incorporates a regularization loss to enforce the correspondence between the
musical element sequences and the specific dimensions of latent variable
sequences, providing a new solution for disentangling discrete sequences.
Taking advantage of the disentangled latent vectors, a two-level decoding
strategy that includes multiple decoders attending to latent vectors with
different semantics is devised to better predict the elements. By visualizing
latent space, we conclude that MusER yields a disentangled and interpretable
latent space and gain insights into the contribution of distinct elements to
the emotional dimensions (i.e., arousal and valence). Experimental results
demonstrate that MusER outperforms the state-of-the-art models for generating
emotional music in both objective and subjective evaluation. Besides, we
rearrange music through element transfer and attempt to alter the emotion of
music by transferring emotion-distinguishable elements.
Related papers
- Emotion-Driven Melody Harmonization via Melodic Variation and Functional Representation [16.790582113573453]
Emotion-driven melody aims to generate diverse harmonies for a single melody to convey desired emotions.
Previous research found it hard to alter the perceived emotional valence of lead sheets only by harmonizing the same melody with different chords.
In this paper, we propose a novel functional representation for symbolic music.
arXiv Detail & Related papers (2024-07-29T17:05:12Z) - Emotion Manipulation Through Music -- A Deep Learning Interactive Visual Approach [0.0]
We introduce a novel way to manipulate the emotional content of a song using AI tools.
Our goal is to achieve the desired emotion while leaving the original melody as intact as possible.
This research may contribute to on-demand custom music generation, the automated remixing of existing work, and music playlists tuned for emotional progression.
arXiv Detail & Related papers (2024-06-12T20:12:29Z) - Are Words Enough? On the semantic conditioning of affective music
generation [1.534667887016089]
This scoping review aims to analyze and discuss the possibilities of music generation conditioned by emotions.
In detail, we review two main paradigms adopted in automatic music generation: rules-based and machine-learning models.
We conclude that overcoming the limitation and ambiguity of language to express emotions through music has the potential to impact the creative industries.
arXiv Detail & Related papers (2023-11-07T00:19:09Z) - REMAST: Real-time Emotion-based Music Arrangement with Soft Transition [29.34094293561448]
Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies.
We propose REMAST to achieve emotion real-time fit and smooth transition simultaneously.
According to the evaluation results, REMAST surpasses the state-of-the-art methods in objective and subjective metrics.
arXiv Detail & Related papers (2023-05-14T00:09:48Z) - Speech Synthesis with Mixed Emotions [77.05097999561298]
We propose a novel formulation that measures the relative difference between the speech samples of different emotions.
We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework.
At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector.
arXiv Detail & Related papers (2022-08-11T15:45:58Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML.
In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - Musical Prosody-Driven Emotion Classification: Interpreting Vocalists
Portrayal of Emotions Through Machine Learning [0.0]
The role of musical prosody remains under-explored despite several studies demonstrating a strong connection between prosody and emotion.
In this study, we restrict the input of traditional machine learning algorithms to the features of musical prosody.
We utilize a methodology for individual data collection from vocalists, and personal ground truth labeling by the artist themselves.
arXiv Detail & Related papers (2021-06-04T15:40:19Z) - Emotion-Based End-to-End Matching Between Image and Music in
Valence-Arousal Space [80.49156615923106]
Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger.
Existing emotion-based image and music matching methods either employ limited categorical emotion states or train the matching model using an impractical multi-stage pipeline.
In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space.
arXiv Detail & Related papers (2020-08-22T20:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.