Towards Explaining Expressive Qualities in Piano Recordings: Transfer of
Explanatory Features via Acoustic Domain Adaptation
- URL: http://arxiv.org/abs/2102.13479v1
- Date: Fri, 26 Feb 2021 13:49:44 GMT
- Title: Towards Explaining Expressive Qualities in Piano Recordings: Transfer of
Explanatory Features via Acoustic Domain Adaptation
- Authors: Shreyan Chowdhury and Gerhard Widmer
- Abstract summary: In this work, we show that by utilising unsupervised domain adaptation together with receptive-field regularised deep neural networks, it is possible to significantly improve generalisation to this domain.
We demonstrate that our domain-adapted models can better predict and explain expressive qualities in classical piano performances, as perceived and described by human listeners.
- Score: 8.071506311915396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion and expressivity in music have been topics of considerable interest
in the field of music information retrieval. In recent years, mid-level
perceptual features have been suggested as means to explain computational
predictions of musical emotion. We find that the diversity of musical styles
and genres in the available dataset for learning these features is not
sufficient for models to generalise well to specialised acoustic domains such
as solo piano music. In this work, we show that by utilising unsupervised
domain adaptation together with receptive-field regularised deep neural
networks, it is possible to significantly improve generalisation to this
domain. Additionally, we demonstrate that our domain-adapted models can better
predict and explain expressive qualities in classical piano performances, as
perceived and described by human listeners.
Related papers
- A Survey of Foundation Models for Music Understanding [60.83532699497597]
This work is one of the early reviews of the intersection of AI techniques and music understanding.
We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities.
arXiv Detail & Related papers (2024-09-15T03:34:14Z) - Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features [5.678610585849838]
Pre-trained deep learning embeddings have consistently shown superior performance over handcrafted acoustic features in speech emotion recognition.
Unlike acoustic features with clear physical meaning, these embeddings lack clear interpretability.
This paper proposes a modified probing approach to explain deep learning embeddings in the speech emotion space.
arXiv Detail & Related papers (2024-09-14T19:18:56Z) - Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music.
This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z) - Joint Learning of Emotions in Music and Generalized Sounds [6.854732863866882]
We propose the use of multiple datasets as a multi-domain learning technique.
Our approach involves creating a common space encompassing features that characterize both generalized sounds and music.
We performed joint learning on the common feature space, leveraging heterogeneous model architectures.
arXiv Detail & Related papers (2024-08-04T12:19:03Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Signal-domain representation of symbolic music for learning embedding
spaces [2.28438857884398]
We introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal.
We show that our signal-like representation leads to better reconstruction and disentangled features.
arXiv Detail & Related papers (2021-09-08T06:36:02Z) - Tracing Back Music Emotion Predictions to Sound Sources and Intuitive
Perceptual Qualities [6.832341432995627]
Music emotion recognition is an important task in MIR (Music Information Retrieval) research.
One important step towards better models would be to understand what a model is actually learning from the data.
We show how to derive explanations of model predictions in terms of spectrogram image segments that connect to the high-level emotion prediction.
arXiv Detail & Related papers (2021-06-14T22:49:19Z) - Musical Prosody-Driven Emotion Classification: Interpreting Vocalists
Portrayal of Emotions Through Machine Learning [0.0]
The role of musical prosody remains under-explored despite several studies demonstrating a strong connection between prosody and emotion.
In this study, we restrict the input of traditional machine learning algorithms to the features of musical prosody.
We utilize a methodology for individual data collection from vocalists, and personal ground truth labeling by the artist themselves.
arXiv Detail & Related papers (2021-06-04T15:40:19Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.