Joint Learning of Emotions in Music and Generalized Sounds
- URL: http://arxiv.org/abs/2408.02009v2
- Date: Wed, 14 Aug 2024 09:28:16 GMT
- Title: Joint Learning of Emotions in Music and Generalized Sounds
- Authors: Federico Simonetta, Francesca Certo, Stavros Ntalampiras,
- Abstract summary: We propose the use of multiple datasets as a multi-domain learning technique.
Our approach involves creating a common space encompassing features that characterize both generalized sounds and music.
We performed joint learning on the common feature space, leveraging heterogeneous model architectures.
- Score: 6.854732863866882
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we aim to determine if generalized sounds and music can share a common emotional space, improving predictions of emotion in terms of arousal and valence. We propose the use of multiple datasets as a multi-domain learning technique. Our approach involves creating a common space encompassing features that characterize both generalized sounds and music, as they can evoke emotions in a similar manner. To achieve this, we utilized two publicly available datasets, namely IADS-E and PMEmo, following a standardized experimental protocol. We employed a wide variety of features that capture diverse aspects of the audio structure including key parameters of spectrum, energy, and voicing. Subsequently, we performed joint learning on the common feature space, leveraging heterogeneous model architectures. Interestingly, this synergistic scheme outperforms the state-of-the-art in both sound and music emotion prediction. The code enabling full replication of the presented experimental pipeline is available at https://github.com/LIMUNIMI/MusicSoundEmotions.
Related papers
- Revisiting Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations [14.058054715262275]
Emotion recognition from speech and music shares similarities due to their acoustic overlap, which has led to interest in transferring knowledge between these domains.
We revisit the acoustic similarity between emotion speech and music, starting with an analysis of the layerwise behavior of Self-Supervised Learning (SSL) models.
Lastly, we explore the acoustic similarities between emotional speech and music using Frechet audio distance for individual emotions, uncovering the issue of emotion bias in both speech and music SSL models.
arXiv Detail & Related papers (2024-09-26T14:49:09Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Song Emotion Recognition: a Performance Comparison Between Audio
Features and Artificial Neural Networks [0.0]
We study the most common features and models used to tackle this problem, revealing which ones are best suited for recognizing emotion in a cappella songs.
In this paper, we studied the most common features and models used in recent publications to tackle this problem, revealing which ones are best suited for recognizing emotion in a cappella songs.
arXiv Detail & Related papers (2022-09-24T16:13:25Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Enhancing Affective Representations of Music-Induced EEG through
Multimodal Supervision and latent Domain Adaptation [34.726185927120355]
We employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space.
We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities.
The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries.
arXiv Detail & Related papers (2022-02-20T07:32:12Z) - A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML.
In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Musical Prosody-Driven Emotion Classification: Interpreting Vocalists
Portrayal of Emotions Through Machine Learning [0.0]
The role of musical prosody remains under-explored despite several studies demonstrating a strong connection between prosody and emotion.
In this study, we restrict the input of traditional machine learning algorithms to the features of musical prosody.
We utilize a methodology for individual data collection from vocalists, and personal ground truth labeling by the artist themselves.
arXiv Detail & Related papers (2021-06-04T15:40:19Z) - Towards Explaining Expressive Qualities in Piano Recordings: Transfer of
Explanatory Features via Acoustic Domain Adaptation [8.071506311915396]
In this work, we show that by utilising unsupervised domain adaptation together with receptive-field regularised deep neural networks, it is possible to significantly improve generalisation to this domain.
We demonstrate that our domain-adapted models can better predict and explain expressive qualities in classical piano performances, as perceived and described by human listeners.
arXiv Detail & Related papers (2021-02-26T13:49:44Z) - Emotion-Based End-to-End Matching Between Image and Music in
Valence-Arousal Space [80.49156615923106]
Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger.
Existing emotion-based image and music matching methods either employ limited categorical emotion states or train the matching model using an impractical multi-stage pipeline.
In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space.
arXiv Detail & Related papers (2020-08-22T20:12:23Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.