A Perceptual Measure for Evaluating the Resynthesis of Automatic Music
Transcriptions
- URL: http://arxiv.org/abs/2202.12257v1
- Date: Thu, 24 Feb 2022 18:09:22 GMT
- Title: A Perceptual Measure for Evaluating the Resynthesis of Automatic Music
Transcriptions
- Authors: Federico Simonetta and Federico Avanzini and Stavros Ntalampiras
- Abstract summary: This study focuses on the perception of music performances when contextual factors, such as room acoustics and instrument, change.
We propose to distinguish the concept of "performance" from the one of "interpretation", which expresses the "artistic intention"
- Score: 10.957528713294874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study focuses on the perception of music performances when contextual
factors, such as room acoustics and instrument, change. We propose to
distinguish the concept of "performance" from the one of "interpretation",
which expresses the "artistic intention". Towards assessing this distinction,
we carried out an experimental evaluation where 91 subjects were invited to
listen to various audio recordings created by resynthesizing MIDI data obtained
through Automatic Music Transcription (AMT) systems and a sensorized acoustic
piano. During the resynthesis, we simulated different contexts and asked
listeners to evaluate how much the interpretation changes when the context
changes. Results show that: (1) MIDI format alone is not able to completely
grasp the artistic intention of a music performance; (2) usual objective
evaluation measures based on MIDI data present low correlations with the
average subjective evaluation. To bridge this gap, we propose a novel measure
which is meaningfully correlated with the outcome of the tests. In addition, we
investigate multimodal machine learning by providing a new score-informed AMT
method and propose an approximation algorithm for the $p$-dispersion problem.
Related papers
- Automatic Estimation of Singing Voice Musical Dynamics [9.343063100314687]
We propose a methodology for dataset curation.
We compile a dataset comprising 509 musical dynamics annotated singing voice performances, aligned with 163 score files.
We train a CNN model with varying window sizes to evaluate the effectiveness of estimating musical dynamics.
We conclude through our experiments that bark-scale based features outperform log-Mel-features for the task of singing voice dynamics prediction.
arXiv Detail & Related papers (2024-10-27T18:15:18Z) - Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems [3.5570874721859016]
Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music.
We identify two primary sources of distribution shift: the music, and the sound.
We evaluate the performance of several SotA AMT systems on two new experimental test sets.
arXiv Detail & Related papers (2024-08-08T19:40:28Z) - Towards Explainable and Interpretable Musical Difficulty Estimation: A Parameter-efficient Approach [49.2787113554916]
Estimating music piece difficulty is important for organizing educational music collections.
Our work employs explainable descriptors for difficulty estimation in symbolic music representations.
Our approach, evaluated in piano repertoire categorized in 9 classes, achieved 41.4% accuracy independently, with a mean squared error (MSE) of 1.7.
arXiv Detail & Related papers (2024-08-01T11:23:42Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Context-aware Automatic Music Transcription [10.957528713294874]
This paper presents an Automatic Music Transcription system that incorporates context-related information.
Motivated by the state-of-art psychological research, we propose a methodology boosting the accuracy of AMT systems.
arXiv Detail & Related papers (2022-03-30T13:36:17Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Structure-Aware Audio-to-Score Alignment using Progressively Dilated
Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment.
We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - Time-Frequency Scattering Accurately Models Auditory Similarities
Between Instrumental Playing Techniques [5.923588533979649]
We show that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone.
We propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques.
arXiv Detail & Related papers (2020-07-21T16:37:15Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.