Learning Frame Similarity using Siamese networks for Audio-to-Score
Alignment
- URL: http://arxiv.org/abs/2011.07546v1
- Date: Sun, 15 Nov 2020 14:58:03 GMT
- Title: Learning Frame Similarity using Siamese networks for Audio-to-Score
Alignment
- Authors: Ruchit Agrawal, Simon Dixon
- Abstract summary: We propose a method to overcome the limitation using learned frame similarity for audio-to-score alignment.
We focus on offline audio-to-score alignment of piano music.
- Score: 13.269759433551478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Audio-to-score alignment aims at generating an accurate mapping between a
performance audio and the score of a given piece. Standard alignment methods
are based on Dynamic Time Warping (DTW) and employ handcrafted features, which
cannot be adapted to different acoustic conditions. We propose a method to
overcome this limitation using learned frame similarity for audio-to-score
alignment. We focus on offline audio-to-score alignment of piano music.
Experiments on music data from different acoustic conditions demonstrate that
our method achieves higher alignment accuracy than a standard DTW-based method
that uses handcrafted features, and generates robust alignments whilst being
adaptable to different domains at the same time.
Related papers
- DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment [67.10208647482109]
The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings.
This paper proposes AlignSTS, an STS model based on explicit cross-modal alignment.
Experiments show that AlignSTS achieves superior performance in terms of both objective and subjective metrics.
arXiv Detail & Related papers (2023-05-08T06:02:10Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Play It Back: Iterative Attention for Audio Recognition [104.628661890361]
A key function of auditory cognition is the association of characteristic sounds with their corresponding semantics over time.
We propose an end-to-end attention-based architecture that through selective repetition attends over the most discriminative sounds.
We show that our method can consistently achieve state-of-the-art performance across three audio-classification benchmarks.
arXiv Detail & Related papers (2022-10-20T15:03:22Z) - Towards Context-Aware Neural Performance-Score Synchronisation [2.0305676256390934]
Music synchronisation provides a way to navigate among multiple representations of music in a unified manner.
Traditional synchronisation methods compute alignment using knowledge-driven and performance analysis approaches.
This PhD furthers the development of performance-score synchronisation research by proposing data-driven, context-aware alignment approaches.
arXiv Detail & Related papers (2022-05-31T16:45:25Z) - A Convolutional-Attentional Neural Framework for Structure-Aware
Performance-Score Synchronization [12.951369232106178]
Performance-score synchronization is an integral task in signal processing.
Traditional synchronization methods compute alignment using knowledge-driven approaches.
We present a novel data-driven method for structure-score synchronization.
arXiv Detail & Related papers (2022-04-19T11:41:21Z) - Using multiple reference audios and style embedding constraints for
speech synthesis [68.62945852651383]
The proposed model can improve the speech naturalness and content quality with multiple reference audios.
The model can also outperform the baseline model in ABX preference tests of style similarity.
arXiv Detail & Related papers (2021-10-09T04:24:29Z) - Strumming to the Beat: Audio-Conditioned Contrastive Video Textures [112.6140796961121]
We introduce a non-parametric approach for infinite video texture synthesis using a representation learned via contrastive learning.
We take inspiration from Video Textures, which showed that plausible new videos could be generated from a single one by stitching its frames together in a novel yet consistent order.
Our model outperforms baselines on human perceptual scores, can handle a diverse range of input videos, and can combine semantic and audio-visual cues in order to synthesize videos that synchronize well with an audio signal.
arXiv Detail & Related papers (2021-04-06T17:24:57Z) - Structure-Aware Audio-to-Score Alignment using Progressively Dilated
Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment.
We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z) - A Hybrid Approach to Audio-to-Score Alignment [13.269759433551478]
Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece.
Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features.
We explore the usage of neural networks as a preprocessing step for DTW-based automatic alignment methods.
arXiv Detail & Related papers (2020-07-28T16:04:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.