A Perceptual Measure for Evaluating the Resynthesis of Automatic Music
Transcriptions
- URL: http://arxiv.org/abs/2202.12257v1
- Date: Thu, 24 Feb 2022 18:09:22 GMT
- Title: A Perceptual Measure for Evaluating the Resynthesis of Automatic Music
Transcriptions
- Authors: Federico Simonetta and Federico Avanzini and Stavros Ntalampiras
- Abstract summary: This study focuses on the perception of music performances when contextual factors, such as room acoustics and instrument, change.
We propose to distinguish the concept of "performance" from the one of "interpretation", which expresses the "artistic intention"
- Score: 10.957528713294874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study focuses on the perception of music performances when contextual
factors, such as room acoustics and instrument, change. We propose to
distinguish the concept of "performance" from the one of "interpretation",
which expresses the "artistic intention". Towards assessing this distinction,
we carried out an experimental evaluation where 91 subjects were invited to
listen to various audio recordings created by resynthesizing MIDI data obtained
through Automatic Music Transcription (AMT) systems and a sensorized acoustic
piano. During the resynthesis, we simulated different contexts and asked
listeners to evaluate how much the interpretation changes when the context
changes. Results show that: (1) MIDI format alone is not able to completely
grasp the artistic intention of a music performance; (2) usual objective
evaluation measures based on MIDI data present low correlations with the
average subjective evaluation. To bridge this gap, we propose a novel measure
which is meaningfully correlated with the outcome of the tests. In addition, we
investigate multimodal machine learning by providing a new score-informed AMT
method and propose an approximation algorithm for the $p$-dispersion problem.
Related papers
- Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems [3.5570874721859016]
Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music.
We identify two primary sources of distribution shift: the music, and the sound.
We evaluate the performance of several SotA AMT systems on two new experimental test sets.
arXiv Detail & Related papers (2024-08-08T19:40:28Z) - Towards Explainable and Interpretable Musical Difficulty Estimation: A Parameter-efficient Approach [49.2787113554916]
Estimating music piece difficulty is important for organizing educational music collections.
Our work employs explainable descriptors for difficulty estimation in symbolic music representations.
Our approach, evaluated in piano repertoire categorized in 9 classes, achieved 41.4% accuracy independently, with a mean squared error (MSE) of 1.7.
arXiv Detail & Related papers (2024-08-01T11:23:42Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Context-aware Automatic Music Transcription [10.957528713294874]
This paper presents an Automatic Music Transcription system that incorporates context-related information.
Motivated by the state-of-art psychological research, we propose a methodology boosting the accuracy of AMT systems.
arXiv Detail & Related papers (2022-03-30T13:36:17Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Tracing Back Music Emotion Predictions to Sound Sources and Intuitive
Perceptual Qualities [6.832341432995627]
Music emotion recognition is an important task in MIR (Music Information Retrieval) research.
One important step towards better models would be to understand what a model is actually learning from the data.
We show how to derive explanations of model predictions in terms of spectrogram image segments that connect to the high-level emotion prediction.
arXiv Detail & Related papers (2021-06-14T22:49:19Z) - Structure-Aware Audio-to-Score Alignment using Progressively Dilated
Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment.
We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - Time-Frequency Scattering Accurately Models Auditory Similarities
Between Instrumental Playing Techniques [5.923588533979649]
We show that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone.
We propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques.
arXiv Detail & Related papers (2020-07-21T16:37:15Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.