A Convolutional-Attentional Neural Framework for Structure-Aware
Performance-Score Synchronization
- URL: http://arxiv.org/abs/2204.08822v1
- Date: Tue, 19 Apr 2022 11:41:21 GMT
- Title: A Convolutional-Attentional Neural Framework for Structure-Aware
Performance-Score Synchronization
- Authors: Ruchit Agrawal, Daniel Wolff, Simon Dixon
- Abstract summary: Performance-score synchronization is an integral task in signal processing.
Traditional synchronization methods compute alignment using knowledge-driven approaches.
We present a novel data-driven method for structure-score synchronization.
- Score: 12.951369232106178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performance-score synchronization is an integral task in signal processing,
which entails generating an accurate mapping between an audio recording of a
performance and the corresponding musical score. Traditional synchronization
methods compute alignment using knowledge-driven and stochastic approaches, and
are typically unable to generalize well to different domains and modalities. We
present a novel data-driven method for structure-aware performance-score
synchronization. We propose a convolutional-attentional architecture trained
with a custom loss based on time-series divergence. We conduct experiments for
the audio-to-MIDI and audio-to-image alignment tasks pertained to different
score modalities. We validate the effectiveness of our method via ablation
studies and comparisons with state-of-the-art alignment approaches. We
demonstrate that our approach outperforms previous synchronization methods for
a variety of test settings across score modalities and acoustic conditions. Our
method is also robust to structural differences between the performance and
score sequences, which is a common limitation of standard alignment approaches.
Related papers
- Decomposable Transformer Point Processes [2.1756081703276]
We propose a framework where the advantages of the attention-based architecture are maintained and the limitation of the thinning algorithm is circumvented.
The proposed method attains state-of-the-art performance in predicting the next event of a sequence given its history.
arXiv Detail & Related papers (2024-09-26T13:22:58Z) - Automatic Equalization for Individual Instrument Tracks Using Convolutional Neural Networks [2.5944208050492183]
We propose a novel approach for the automatic equalization of individual musical instrument tracks.
Our method begins by identifying the instrument present within a source recording in order to choose its corresponding ideal spectrum as a target.
We build upon a differentiable parametric equalizer matching neural network, demonstrating improvements relative to previously established state-of-the-art.
arXiv Detail & Related papers (2024-07-23T17:55:25Z) - Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching [17.344430840048094]
Recent learning-based methods prioritize optimal performance on a single stereo pair, resulting in temporal inconsistencies.
We develop a bidirectional alignment mechanism for adjacent frames as a fundamental operation.
Unlike the existing methods, we model this task as local matching and global aggregation.
arXiv Detail & Related papers (2024-03-16T01:38:28Z) - Synchformer: Efficient Synchronization from Sparse Cues [100.89656994681934]
Our contributions include a novel audio-visual synchronization model, and training that decouples extraction from synchronization modelling.
This approach achieves state-of-the-art performance in both dense and sparse settings.
We also extend synchronization model training to AudioSet a million-scale 'in-the-wild' dataset, investigate evidence attribution techniques for interpretability, and explore a new capability for synchronization models: audio-visual synchronizability.
arXiv Detail & Related papers (2024-01-29T18:59:55Z) - DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement.
In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z) - Towards Context-Aware Neural Performance-Score Synchronisation [2.0305676256390934]
Music synchronisation provides a way to navigate among multiple representations of music in a unified manner.
Traditional synchronisation methods compute alignment using knowledge-driven and performance analysis approaches.
This PhD furthers the development of performance-score synchronisation research by proposing data-driven, context-aware alignment approaches.
arXiv Detail & Related papers (2022-05-31T16:45:25Z) - FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
Assessment [93.09267863425492]
We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable.
We construct a new fine-grained dataset, called FineDiving, developed on diverse diving events with detailed annotations on action procedures.
arXiv Detail & Related papers (2022-04-07T17:59:32Z) - Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual
Speech Separation [73.1652905564163]
We address the problem of separating individual speech signals from videos using audio-visual neural processing.
Most conventional approaches utilize frame-wise matching criteria to extract shared information between co-occurring audio and video.
We propose a cross-modal affinity network (CaffNet) that learns global correspondence as well as locally-varying affinities between audio and visual streams.
arXiv Detail & Related papers (2021-03-25T15:39:12Z) - Exploiting Attention-based Sequence-to-Sequence Architectures for Sound
Event Localization [113.19483349876668]
This paper proposes a novel approach to sound event localization by utilizing an attention-based sequence-to-sequence model.
It yields superior localization performance compared to state-of-the-art methods in both anechoic and reverberant conditions.
arXiv Detail & Related papers (2021-02-28T07:52:20Z) - Structure-Aware Audio-to-Score Alignment using Progressively Dilated
Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment.
We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.