REMAST: Real-time Emotion-based Music Arrangement with Soft Transition
- URL: http://arxiv.org/abs/2305.08029v3
- Date: Sat, 27 Jul 2024 01:52:23 GMT
- Title: REMAST: Real-time Emotion-based Music Arrangement with Soft Transition
- Authors: Zihao Wang, Le Ma, Chen Zhang, Bo Han, Yunfei Xu, Yikai Wang, Xinyi Chen, HaoRong Hong, Wenbo Liu, Xinda Wu, Kejun Zhang,
- Abstract summary: Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies.
We propose REMAST to achieve emotion real-time fit and smooth transition simultaneously.
According to the evaluation results, REMAST surpasses the state-of-the-art methods in objective and subjective metrics.
- Score: 29.34094293561448
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies. However, music needs real-time arrangement according to changing emotions, bringing challenges to balance emotion real-time fit and soft emotion transition due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion real-time fit, while the issue of smooth transition remains understudied, affecting the overall emotional coherence of the music. In this paper, we propose REMAST to address this trade-off. Specifically, we recognize the last timestep's music emotion and fuse it with the current timestep's input emotion. The fused emotion then guides REMAST to generate the music based on the input melody. To adjust music similarity and emotion real-time fit flexibly, we downsample the original melody and feed it into the generation model. Furthermore, we design four music theory features by domain knowledge to enhance emotion information and employ semi-supervised learning to mitigate the subjective bias introduced by manual dataset annotation. According to the evaluation results, REMAST surpasses the state-of-the-art methods in objective and subjective metrics. These results demonstrate that REMAST achieves real-time fit and smooth transition simultaneously, enhancing the coherence of the generated music.
Related papers
- Audio-Driven Emotional 3D Talking-Head Generation [47.6666060652434]
We present a novel system for synthesizing high-fidelity, audio-driven video portraits with accurate emotional expressions.
We propose a pose sampling method that generates natural idle-state (non-speaking) videos in response to silent audio inputs.
arXiv Detail & Related papers (2024-10-07T08:23:05Z) - Emotion Manipulation Through Music -- A Deep Learning Interactive Visual Approach [0.0]
We introduce a novel way to manipulate the emotional content of a song using AI tools.
Our goal is to achieve the desired emotion while leaving the original melody as intact as possible.
This research may contribute to on-demand custom music generation, the automated remixing of existing work, and music playlists tuned for emotional progression.
arXiv Detail & Related papers (2024-06-12T20:12:29Z) - MusER: Musical Element-Based Regularization for Generating Symbolic
Music with Emotion [16.658813060879293]
We present a novel approach employing musical element-based regularization in the latent space to disentangle distinct elements.
By visualizing latent space, we conclude that MusER yields a disentangled and interpretable latent space.
Experimental results demonstrate that MusER outperforms the state-of-the-art models for generating emotional music.
arXiv Detail & Related papers (2023-12-16T03:50:13Z) - Exploring the Emotional Landscape of Music: An Analysis of Valence
Trends and Genre Variations in Spotify Music Data [0.0]
This paper conducts an intricate analysis of musical emotions and trends using Spotify music data.
Employing regression modeling, temporal analysis, mood transitions, and genre investigation, the study uncovers patterns within music-emotion relationships.
arXiv Detail & Related papers (2023-10-29T15:57:31Z) - EmoGen: Eliminating Subjective Bias in Emotional Music Generation [34.910412265225816]
EmoGen is an emotional music generation system that leverages a set of emotion-related music attributes as the bridge between emotion and music.
Both subjective and objective evaluations show that EmoGen outperforms previous methods on emotion control accuracy and music quality.
arXiv Detail & Related papers (2023-07-03T05:54:29Z) - Emotional Speech-Driven Animation with Content-Emotion Disentanglement [51.34635009347183]
We propose EMOTE, which generates 3D talking-head avatars that maintain lip-sync from speech while enabling explicit control over the expression of emotion.
EmOTE produces speech-driven facial animations with better lip-sync than state-of-the-art methods trained on the same data.
arXiv Detail & Related papers (2023-06-15T09:31:31Z) - Speech Synthesis with Mixed Emotions [77.05097999561298]
We propose a novel formulation that measures the relative difference between the speech samples of different emotions.
We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework.
At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector.
arXiv Detail & Related papers (2022-08-11T15:45:58Z) - A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML.
In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - Musical Prosody-Driven Emotion Classification: Interpreting Vocalists
Portrayal of Emotions Through Machine Learning [0.0]
The role of musical prosody remains under-explored despite several studies demonstrating a strong connection between prosody and emotion.
In this study, we restrict the input of traditional machine learning algorithms to the features of musical prosody.
We utilize a methodology for individual data collection from vocalists, and personal ground truth labeling by the artist themselves.
arXiv Detail & Related papers (2021-06-04T15:40:19Z) - Emotion-Based End-to-End Matching Between Image and Music in
Valence-Arousal Space [80.49156615923106]
Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger.
Existing emotion-based image and music matching methods either employ limited categorical emotion states or train the matching model using an impractical multi-stage pipeline.
In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space.
arXiv Detail & Related papers (2020-08-22T20:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.