Related papers: Towards Explaining Expressive Qualities in Piano Recordings: Transfer of Explanatory Features via Acoustic Domain Adaptation

Towards Explaining Expressive Qualities in Piano Recordings: Transfer of Explanatory Features via Acoustic Domain Adaptation

URL: http://arxiv.org/abs/2102.13479v1
Date: Fri, 26 Feb 2021 13:49:44 GMT
Title: Towards Explaining Expressive Qualities in Piano Recordings: Transfer of Explanatory Features via Acoustic Domain Adaptation
Authors: Shreyan Chowdhury and Gerhard Widmer
Abstract summary: In this work, we show that by utilising unsupervised domain adaptation together with receptive-field regularised deep neural networks, it is possible to significantly improve generalisation to this domain. We demonstrate that our domain-adapted models can better predict and explain expressive qualities in classical piano performances, as perceived and described by human listeners.
Score: 8.071506311915396
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Emotion and expressivity in music have been topics of considerable interest in the field of music information retrieval. In recent years, mid-level perceptual features have been suggested as means to explain computational predictions of musical emotion. We find that the diversity of musical styles and genres in the available dataset for learning these features is not sufficient for models to generalise well to specialised acoustic domains such as solo piano music. In this work, we show that by utilising unsupervised domain adaptation together with receptive-field regularised deep neural networks, it is possible to significantly improve generalisation to this domain. Additionally, we demonstrate that our domain-adapted models can better predict and explain expressive qualities in classical piano performances, as perceived and described by human listeners.

Related papers

Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification [0.0]
We present an innovative model for classifying music genres using attention-based temporal signature modeling. Our approach captures the most temporally significant moments within each piece, crafting a unique "signature" for genre identification. This work bridges the gap between technical classification tasks and the nuanced, human experience of genre.
arXiv Detail & Related papers (2024-11-18T21:57:03Z)
A Survey of Foundation Models for Music Understanding [60.83532699497597]
This work is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities.
arXiv Detail & Related papers (2024-09-15T03:34:14Z)
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features [5.678610585849838]
Pre-trained deep learning embeddings have consistently shown superior performance over handcrafted acoustic features in speech emotion recognition. Unlike acoustic features with clear physical meaning, these embeddings lack clear interpretability. This paper proposes a modified probing approach to explain deep learning embeddings in the speech emotion space.
arXiv Detail & Related papers (2024-09-14T19:18:56Z)
Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z)
Joint Learning of Emotions in Music and Generalized Sounds [6.854732863866882]
We propose the use of multiple datasets as a multi-domain learning technique. Our approach involves creating a common space encompassing features that characterize both generalized sounds and music. We performed joint learning on the common feature space, leveraging heterogeneous model architectures.
arXiv Detail & Related papers (2024-08-04T12:19:03Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
Tracing Back Music Emotion Predictions to Sound Sources and Intuitive Perceptual Qualities [6.832341432995627]
Music emotion recognition is an important task in MIR (Music Information Retrieval) research. One important step towards better models would be to understand what a model is actually learning from the data. We show how to derive explanations of model predictions in terms of spectrogram image segments that connect to the high-level emotion prediction.
arXiv Detail & Related papers (2021-06-14T22:49:19Z)
Musical Prosody-Driven Emotion Classification: Interpreting Vocalists Portrayal of Emotions Through Machine Learning [0.0]
The role of musical prosody remains under-explored despite several studies demonstrating a strong connection between prosody and emotion. In this study, we restrict the input of traditional machine learning algorithms to the features of musical prosody. We utilize a methodology for individual data collection from vocalists, and personal ground truth labeling by the artist themselves.
arXiv Detail & Related papers (2021-06-04T15:40:19Z)
Sequence Generation using Deep Recurrent Networks and Embeddings: A study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition. A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z)
Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music. We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone. The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.