Tracing Back Music Emotion Predictions to Sound Sources and Intuitive
Perceptual Qualities
- URL: http://arxiv.org/abs/2106.07787v2
- Date: Wed, 16 Jun 2021 16:25:14 GMT
- Title: Tracing Back Music Emotion Predictions to Sound Sources and Intuitive
Perceptual Qualities
- Authors: Shreyan Chowdhury, Verena Praher, Gerhard Widmer
- Abstract summary: Music emotion recognition is an important task in MIR (Music Information Retrieval) research.
One important step towards better models would be to understand what a model is actually learning from the data.
We show how to derive explanations of model predictions in terms of spectrogram image segments that connect to the high-level emotion prediction.
- Score: 6.832341432995627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Music emotion recognition is an important task in MIR (Music Information
Retrieval) research. Owing to factors like the subjective nature of the task
and the variation of emotional cues between musical genres, there are still
significant challenges in developing reliable and generalizable models. One
important step towards better models would be to understand what a model is
actually learning from the data and how the prediction for a particular input
is made. In previous work, we have shown how to derive explanations of model
predictions in terms of spectrogram image segments that connect to the
high-level emotion prediction via a layer of easily interpretable perceptual
features. However, that scheme lacks intuitive musical comprehensibility at the
spectrogram level. In the present work, we bridge this gap by merging audioLIME
-- a source-separation based explainer -- with mid-level perceptual features,
thus forming an intuitive connection chain between the input audio and the
output emotion predictions. We demonstrate the usefulness of this method by
applying it to debug a biased emotion prediction model.
Related papers
- MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Rationalizing Predictions by Adversarial Information Calibration [65.19407304154177]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features.
arXiv Detail & Related papers (2023-01-15T03:13:09Z) - Relating Human Perception of Musicality to Prediction in a Predictive
Coding Model [0.8062120534124607]
We explore the use of a neural network inspired by predictive coding for modeling human music perception.
This network was developed based on the computational neuroscience theory of recurrent interactions in the hierarchical visual cortex.
We adapt this network to model the hierarchical auditory system and investigate whether it will make similar choices to humans regarding the musicality of a set of random pitch sequences.
arXiv Detail & Related papers (2022-10-29T12:20:01Z) - Song Emotion Recognition: a Performance Comparison Between Audio
Features and Artificial Neural Networks [0.0]
We study the most common features and models used to tackle this problem, revealing which ones are best suited for recognizing emotion in a cappella songs.
In this paper, we studied the most common features and models used in recent publications to tackle this problem, revealing which ones are best suited for recognizing emotion in a cappella songs.
arXiv Detail & Related papers (2022-09-24T16:13:25Z) - Cadence Detection in Symbolic Classical Music using Graph Neural
Networks [7.817685358710508]
We present a graph representation of symbolic scores as an intermediate means to solve the cadence detection task.
We approach cadence detection as an imbalanced node classification problem using a Graph Convolutional Network.
Our experiments suggest that graph convolution can learn non-local features that assist in cadence detection, freeing us from the need of having to devise specialized features that encode non-local context.
arXiv Detail & Related papers (2022-08-31T12:39:57Z) - Enhancing Affective Representations of Music-Induced EEG through
Multimodal Supervision and latent Domain Adaptation [34.726185927120355]
We employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space.
We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities.
The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries.
arXiv Detail & Related papers (2022-02-20T07:32:12Z) - Visualizing Ensemble Predictions of Music Mood [4.5383186433033735]
We show that visualization techniques can effectively convey the popular prediction as well as uncertainty at different music sections along the temporal axis.
We introduce a new variant of ThemeRiver, called "dual-flux ThemeRiver", which allows viewers to observe and measure the most popular prediction more easily.
arXiv Detail & Related papers (2021-12-14T18:13:21Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.