Towards Explaining Expressive Qualities in Piano Recordings: Transfer of
Explanatory Features via Acoustic Domain Adaptation
- URL: http://arxiv.org/abs/2102.13479v1
- Date: Fri, 26 Feb 2021 13:49:44 GMT
- Title: Towards Explaining Expressive Qualities in Piano Recordings: Transfer of
Explanatory Features via Acoustic Domain Adaptation
- Authors: Shreyan Chowdhury and Gerhard Widmer
- Abstract summary: In this work, we show that by utilising unsupervised domain adaptation together with receptive-field regularised deep neural networks, it is possible to significantly improve generalisation to this domain.
We demonstrate that our domain-adapted models can better predict and explain expressive qualities in classical piano performances, as perceived and described by human listeners.
- Score: 8.071506311915396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotion and expressivity in music have been topics of considerable interest
in the field of music information retrieval. In recent years, mid-level
perceptual features have been suggested as means to explain computational
predictions of musical emotion. We find that the diversity of musical styles
and genres in the available dataset for learning these features is not
sufficient for models to generalise well to specialised acoustic domains such
as solo piano music. In this work, we show that by utilising unsupervised
domain adaptation together with receptive-field regularised deep neural
networks, it is possible to significantly improve generalisation to this
domain. Additionally, we demonstrate that our domain-adapted models can better
predict and explain expressive qualities in classical piano performances, as
perceived and described by human listeners.
Related papers
- MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Song Emotion Recognition: a Performance Comparison Between Audio
Features and Artificial Neural Networks [0.0]
We study the most common features and models used to tackle this problem, revealing which ones are best suited for recognizing emotion in a cappella songs.
In this paper, we studied the most common features and models used in recent publications to tackle this problem, revealing which ones are best suited for recognizing emotion in a cappella songs.
arXiv Detail & Related papers (2022-09-24T16:13:25Z) - Learning Neural Acoustic Fields [110.22937202449025]
We introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene.
By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs.
We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations.
arXiv Detail & Related papers (2022-04-04T17:59:37Z) - Towards Cross-Cultural Analysis using Music Information Dynamics [7.4517333921953215]
Music from different cultures establish different aesthetics by having different style conventions on two aspects.
We propose a framework that could be used to quantitatively compare music from different cultures by looking at these two aspects.
arXiv Detail & Related papers (2021-11-24T16:05:29Z) - Signal-domain representation of symbolic music for learning embedding
spaces [2.28438857884398]
We introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal.
We show that our signal-like representation leads to better reconstruction and disentangled features.
arXiv Detail & Related papers (2021-09-08T06:36:02Z) - Tracing Back Music Emotion Predictions to Sound Sources and Intuitive
Perceptual Qualities [6.832341432995627]
Music emotion recognition is an important task in MIR (Music Information Retrieval) research.
One important step towards better models would be to understand what a model is actually learning from the data.
We show how to derive explanations of model predictions in terms of spectrogram image segments that connect to the high-level emotion prediction.
arXiv Detail & Related papers (2021-06-14T22:49:19Z) - Musical Prosody-Driven Emotion Classification: Interpreting Vocalists
Portrayal of Emotions Through Machine Learning [0.0]
The role of musical prosody remains under-explored despite several studies demonstrating a strong connection between prosody and emotion.
In this study, we restrict the input of traditional machine learning algorithms to the features of musical prosody.
We utilize a methodology for individual data collection from vocalists, and personal ground truth labeling by the artist themselves.
arXiv Detail & Related papers (2021-06-04T15:40:19Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.