Related papers: A Perceptual Measure for Evaluating the Resynthesis of Automatic Music Transcriptions

A Perceptual Measure for Evaluating the Resynthesis of Automatic Music Transcriptions

URL: http://arxiv.org/abs/2202.12257v1
Date: Thu, 24 Feb 2022 18:09:22 GMT
Title: A Perceptual Measure for Evaluating the Resynthesis of Automatic Music Transcriptions
Authors: Federico Simonetta and Federico Avanzini and Stavros Ntalampiras
Abstract summary: This study focuses on the perception of music performances when contextual factors, such as room acoustics and instrument, change. We propose to distinguish the concept of "performance" from the one of "interpretation", which expresses the "artistic intention"
Score: 10.957528713294874
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study focuses on the perception of music performances when contextual factors, such as room acoustics and instrument, change. We propose to distinguish the concept of "performance" from the one of "interpretation", which expresses the "artistic intention". Towards assessing this distinction, we carried out an experimental evaluation where 91 subjects were invited to listen to various audio recordings created by resynthesizing MIDI data obtained through Automatic Music Transcription (AMT) systems and a sensorized acoustic piano. During the resynthesis, we simulated different contexts and asked listeners to evaluate how much the interpretation changes when the context changes. Results show that: (1) MIDI format alone is not able to completely grasp the artistic intention of a music performance; (2) usual objective evaluation measures based on MIDI data present low correlations with the average subjective evaluation. To bridge this gap, we propose a novel measure which is meaningfully correlated with the outcome of the tests. In addition, we investigate multimodal machine learning by providing a new score-informed AMT method and propose an approximation algorithm for the $p$-dispersion problem.

Related papers

Aligning Text-to-Music Evaluation with Human Preferences [63.08368388389259]
We study the design space of reference-based divergence metrics for evaluating generative acoustic text-to-music (TTM) models. We find that not only is the standard FAD setup inconsistent on both synthetic and human preference data, but that nearly all existing metrics fail to effectively capture desiderata. We propose a new metric, the MAUVE Audio Divergence (MAD), computed on representations from a self-supervised audio embedding model.
arXiv Detail & Related papers (2025-03-20T19:31:04Z)
Estimating Musical Surprisal in Audio [4.056099795258358]
Information content (IC) of one-step predictions from an autoregressive model as a proxy for surprisal in symbolic music. We train an autoregressive Transformer model to predict compressed latent audio representations of a pretrained autoencoder network. We investigate the IC's relation to audio and musical features and find it correlated with timbral variations and loudness and, to a lesser extent, dissonance, rhythmic complexity, and onset density related to audio and musical features.
arXiv Detail & Related papers (2025-01-13T16:46:45Z)
Automatic Estimation of Singing Voice Musical Dynamics [9.343063100314687]
We propose a methodology for dataset curation. We compile a dataset comprising 509 musical dynamics annotated singing voice performances, aligned with 163 score files. We train a CNN model with varying window sizes to evaluate the effectiveness of estimating musical dynamics. We conclude through our experiments that bark-scale based features outperform log-Mel-features for the task of singing voice dynamics prediction.
arXiv Detail & Related papers (2024-10-27T18:15:18Z)
Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems [3.5570874721859016]
Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music. We identify two primary sources of distribution shift: the music, and the sound. We evaluate the performance of several SotA AMT systems on two new experimental test sets.
arXiv Detail & Related papers (2024-08-08T19:40:28Z)
Towards Explainable and Interpretable Musical Difficulty Estimation: A Parameter-efficient Approach [49.2787113554916]
Estimating music piece difficulty is important for organizing educational music collections. Our work employs explainable descriptors for difficulty estimation in symbolic music representations. Our approach, evaluated in piano repertoire categorized in 9 classes, achieved 41.4% accuracy independently, with a mean squared error (MSE) of 1.7.
arXiv Detail & Related papers (2024-08-01T11:23:42Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types. We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input. In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z)
Context-aware Automatic Music Transcription [10.957528713294874]
This paper presents an Automatic Music Transcription system that incorporates context-related information. Motivated by the state-of-art psychological research, we propose a methodology boosting the accuracy of AMT systems.
arXiv Detail & Related papers (2022-03-30T13:36:17Z)
Contrastive Learning with Positive-Negative Frame Mask for Music Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR. We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z)
Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment. We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z)
Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated. We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
Time-Frequency Scattering Accurately Models Auditory Similarities Between Instrumental Playing Techniques [5.923588533979649]
We show that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. We propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques.
arXiv Detail & Related papers (2020-07-21T16:37:15Z)
Audio Impairment Recognition Using a Correlation-Based Feature Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs. We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.