A Hybrid Approach to Audio-to-Score Alignment
- URL: http://arxiv.org/abs/2007.14333v1
- Date: Tue, 28 Jul 2020 16:04:19 GMT
- Title: A Hybrid Approach to Audio-to-Score Alignment
- Authors: Ruchit Agrawal and Simon Dixon
- Abstract summary: Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece.
Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features.
We explore the usage of neural networks as a preprocessing step for DTW-based automatic alignment methods.
- Score: 13.269759433551478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Audio-to-score alignment aims at generating an accurate mapping between a
performance audio and the score of a given piece. Standard alignment methods
are based on Dynamic Time Warping (DTW) and employ handcrafted features. We
explore the usage of neural networks as a preprocessing step for DTW-based
automatic alignment methods. Experiments on music data from different acoustic
conditions demonstrate that this method generates robust alignments whilst
being adaptable at the same time.
Related papers
- Automatic Equalization for Individual Instrument Tracks Using Convolutional Neural Networks [2.5944208050492183]
We propose a novel approach for the automatic equalization of individual musical instrument tracks.
Our method begins by identifying the instrument present within a source recording in order to choose its corresponding ideal spectrum as a target.
We build upon a differentiable parametric equalizer matching neural network, demonstrating improvements relative to previously established state-of-the-art.
arXiv Detail & Related papers (2024-07-23T17:55:25Z) - Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching [17.344430840048094]
Recent learning-based methods prioritize optimal performance on a single stereo pair, resulting in temporal inconsistencies.
We develop a bidirectional alignment mechanism for adjacent frames as a fundamental operation.
Unlike the existing methods, we model this task as local matching and global aggregation.
arXiv Detail & Related papers (2024-03-16T01:38:28Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment [67.10208647482109]
The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings.
This paper proposes AlignSTS, an STS model based on explicit cross-modal alignment.
Experiments show that AlignSTS achieves superior performance in terms of both objective and subjective metrics.
arXiv Detail & Related papers (2023-05-08T06:02:10Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Deep Declarative Dynamic Time Warping for End-to-End Learning of
Alignment Paths [54.53208538517505]
This paper addresses learning end-to-end models for time series data that include a temporal alignment step via dynamic time warping (DTW)
We propose a DTW layer based around bi-level optimisation and deep declarative networks, which we name DecDTW.
We show that this property is particularly useful for applications where downstream loss functions are defined on the optimal alignment path itself.
arXiv Detail & Related papers (2023-03-19T21:58:37Z) - Iterative pseudo-forced alignment by acoustic CTC loss for
self-supervised ASR domain adaptation [80.12316877964558]
High-quality data labeling from specific domains is costly and human time-consuming.
We propose a self-supervised domain adaptation method, based upon an iterative pseudo-forced alignment algorithm.
arXiv Detail & Related papers (2022-10-27T07:23:08Z) - Towards Context-Aware Neural Performance-Score Synchronisation [2.0305676256390934]
Music synchronisation provides a way to navigate among multiple representations of music in a unified manner.
Traditional synchronisation methods compute alignment using knowledge-driven and performance analysis approaches.
This PhD furthers the development of performance-score synchronisation research by proposing data-driven, context-aware alignment approaches.
arXiv Detail & Related papers (2022-05-31T16:45:25Z) - A Convolutional-Attentional Neural Framework for Structure-Aware
Performance-Score Synchronization [12.951369232106178]
Performance-score synchronization is an integral task in signal processing.
Traditional synchronization methods compute alignment using knowledge-driven approaches.
We present a novel data-driven method for structure-score synchronization.
arXiv Detail & Related papers (2022-04-19T11:41:21Z) - Learning Frame Similarity using Siamese networks for Audio-to-Score
Alignment [13.269759433551478]
We propose a method to overcome the limitation using learned frame similarity for audio-to-score alignment.
We focus on offline audio-to-score alignment of piano music.
arXiv Detail & Related papers (2020-11-15T14:58:03Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.