Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable
Evaluation
- URL: http://arxiv.org/abs/2202.09198v1
- Date: Fri, 18 Feb 2022 13:52:21 GMT
- Title: Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable
Evaluation
- Authors: Christof Wei{\ss}, Geoffroy Peeters
- Abstract summary: Multi-pitch estimation aims for detecting the simultaneous activity of pitches in polyphonic music recordings.
In this paper, we realize different architectures based on CNNs, the U-net structure, and self-attention components.
We compare variants of these architectures in different sizes for multi-pitch estimation using the MusicNet and Schubert Winterreise datasets.
- Score: 7.599399338954308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting pitch information from music recordings is a challenging but
important problem in music signal processing. Frame-wise transcription or
multi-pitch estimation aims for detecting the simultaneous activity of pitches
in polyphonic music recordings and has recently seen major improvements thanks
to deep-learning techniques, with a variety of proposed network architectures.
In this paper, we realize different architectures based on CNNs, the U-net
structure, and self-attention components. We propose several modifications to
these architectures including self-attention modules for skip connections,
recurrent layers to replace the self-attention, and a multi-task strategy with
simultaneous prediction of the degree of polyphony. We compare variants of
these architectures in different sizes for multi-pitch estimation, focusing on
Western classical music beyond the piano-solo scenario using the MusicNet and
Schubert Winterreise datasets. Our experiments indicate that most architectures
yield competitive results and that larger model variants seem to be beneficial.
However, we find that these results substantially depend on randomization
effects and the particular choice of the training-test split, which questions
the claim of superiority for particular architectures given only small
improvements. We therefore investigate the influence of dataset splits in the
presence of several movements of a work cycle (cross-version evaluation) and
propose a best-practice splitting strategy for MusicNet, which weakens the
influence of individual test tracks and suppresses overfitting to specific
works and recording conditions. A final evaluation on a mixed dataset suggests
that improvements on one specific dataset do not necessarily generalize to
other scenarios, thus emphasizing the need for further high-quality multi-pitch
datasets in order to reliably measure progress in music transcription tasks.
Related papers
- Music Genre Classification: A Comparative Analysis of CNN and XGBoost
Approaches with Mel-frequency cepstral coefficients and Mel Spectrograms [0.0]
This study investigates the performances of three models: a proposed convolutional neural network (CNN), the VGG16 with fully connected layers (FC), and an eXtreme Gradient Boosting (XGBoost) approach on different features.
The results show that the MFCC XGBoost model outperformed the others. Furthermore, applying data segmentation in the data preprocessing phase can significantly enhance the performance of the CNNs.
arXiv Detail & Related papers (2024-01-09T01:50:31Z) - Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music
Transcription [19.228155694144995]
Timbre-Trap is a novel framework which unifies music transcription and audio reconstruction.
We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients.
We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods.
arXiv Detail & Related papers (2023-09-27T15:19:05Z) - Self-Supervised Contrastive Learning for Robust Audio-Sheet Music
Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content.
We employ the snippet embeddings in the higher-level task of cross-modal piece identification.
In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Structure-Aware Audio-to-Score Alignment using Progressively Dilated
Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment.
We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Neural Ensemble Search for Uncertainty Estimation and Dataset Shift [67.57720300323928]
Ensembles of neural networks achieve superior performance compared to stand-alone networks in terms of accuracy, uncertainty calibration and robustness to dataset shift.
We propose two methods for automatically constructing ensembles with emphvarying architectures.
We show that the resulting ensembles outperform deep ensembles not only in terms of accuracy but also uncertainty calibration and robustness to dataset shift.
arXiv Detail & Related papers (2020-06-15T17:38:15Z) - Modeling Musical Structure with Artificial Neural Networks [0.0]
I explore the application of artificial neural networks to different aspects of musical structure modeling.
I show how a connectionist model, the Gated Autoencoder (GAE), can be employed to learn transformations between musical fragments.
I propose a special predictive training of the GAE, which yields a representation of polyphonic music as a sequence of intervals.
arXiv Detail & Related papers (2020-01-06T18:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.