EmoGen: Eliminating Subjective Bias in Emotional Music Generation
- URL: http://arxiv.org/abs/2307.01229v1
- Date: Mon, 3 Jul 2023 05:54:29 GMT
- Title: EmoGen: Eliminating Subjective Bias in Emotional Music Generation
- Authors: Chenfei Kang, Peiling Lu, Botao Yu, Xu Tan, Wei Ye, Shikun Zhang,
Jiang Bian
- Abstract summary: EmoGen is an emotional music generation system that leverages a set of emotion-related music attributes as the bridge between emotion and music.
Both subjective and objective evaluations show that EmoGen outperforms previous methods on emotion control accuracy and music quality.
- Score: 34.910412265225816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Music is used to convey emotions, and thus generating emotional music is
important in automatic music generation. Previous work on emotional music
generation directly uses annotated emotion labels as control signals, which
suffers from subjective bias: different people may annotate different emotions
on the same music, and one person may feel different emotions under different
situations. Therefore, directly mapping emotion labels to music sequences in an
end-to-end way would confuse the learning process and hinder the model from
generating music with general emotions. In this paper, we propose EmoGen, an
emotional music generation system that leverages a set of emotion-related music
attributes as the bridge between emotion and music, and divides the generation
into two stages: emotion-to-attribute mapping with supervised clustering, and
attribute-to-music generation with self-supervised learning. Both stages are
beneficial: in the first stage, the attribute values around the clustering
center represent the general emotions of these samples, which help eliminate
the impacts of the subjective bias of emotion labels; in the second stage, the
generation is completely disentangled from emotion labels and thus free from
the subjective bias. Both subjective and objective evaluations show that EmoGen
outperforms previous methods on emotion control accuracy and music quality
respectively, which demonstrate our superiority in generating emotional music.
Music samples generated by EmoGen are available via this
link:https://ai-muzic.github.io/emogen/, and the code is available at this
link:https://github.com/microsoft/muzic/.
Related papers
- Emotion Manipulation Through Music -- A Deep Learning Interactive Visual Approach [0.0]
We introduce a novel way to manipulate the emotional content of a song using AI tools.
Our goal is to achieve the desired emotion while leaving the original melody as intact as possible.
This research may contribute to on-demand custom music generation, the automated remixing of existing work, and music playlists tuned for emotional progression.
arXiv Detail & Related papers (2024-06-12T20:12:29Z) - EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face
Generation [34.5592743467339]
We propose a visual attribute-guided audio decoupler to generate fine-grained facial animations.
To achieve more precise emotional expression, we introduce a fine-grained emotion coefficient prediction module.
Our proposed method, EmoSpeaker, outperforms existing emotional talking face generation methods in terms of expression variation and lip synchronization.
arXiv Detail & Related papers (2024-02-02T14:04:18Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - MusER: Musical Element-Based Regularization for Generating Symbolic
Music with Emotion [16.658813060879293]
We present a novel approach employing musical element-based regularization in the latent space to disentangle distinct elements.
By visualizing latent space, we conclude that MusER yields a disentangled and interpretable latent space.
Experimental results demonstrate that MusER outperforms the state-of-the-art models for generating emotional music.
arXiv Detail & Related papers (2023-12-16T03:50:13Z) - Are Words Enough? On the semantic conditioning of affective music
generation [1.534667887016089]
This scoping review aims to analyze and discuss the possibilities of music generation conditioned by emotions.
In detail, we review two main paradigms adopted in automatic music generation: rules-based and machine-learning models.
We conclude that overcoming the limitation and ambiguity of language to express emotions through music has the potential to impact the creative industries.
arXiv Detail & Related papers (2023-11-07T00:19:09Z) - REMAST: Real-time Emotion-based Music Arrangement with Soft Transition [29.34094293561448]
Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies.
We propose REMAST to achieve emotion real-time fit and smooth transition simultaneously.
According to the evaluation results, REMAST surpasses the state-of-the-art methods in objective and subjective metrics.
arXiv Detail & Related papers (2023-05-14T00:09:48Z) - Speech Synthesis with Mixed Emotions [77.05097999561298]
We propose a novel formulation that measures the relative difference between the speech samples of different emotions.
We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework.
At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector.
arXiv Detail & Related papers (2022-08-11T15:45:58Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - MIME: MIMicking Emotions for Empathetic Response Generation [82.57304533143756]
Current approaches to empathetic response generation view the set of emotions expressed in the input text as a flat structure.
We argue that empathetic responses often mimic the emotion of the user to a varying degree, depending on its positivity or negativity and content.
arXiv Detail & Related papers (2020-10-04T00:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.