Towards Context-Aware Neural Performance-Score Synchronisation
- URL: http://arxiv.org/abs/2206.00454v1
- Date: Tue, 31 May 2022 16:45:25 GMT
- Title: Towards Context-Aware Neural Performance-Score Synchronisation
- Authors: Ruchit Agrawal
- Abstract summary: Music synchronisation provides a way to navigate among multiple representations of music in a unified manner.
Traditional synchronisation methods compute alignment using knowledge-driven and performance analysis approaches.
This PhD furthers the development of performance-score synchronisation research by proposing data-driven, context-aware alignment approaches.
- Score: 2.0305676256390934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Music can be represented in multiple forms, such as in the audio form as a
recording of a performance, in the symbolic form as a computer readable score,
or in the image form as a scan of the sheet music. Music synchronisation
provides a way to navigate among multiple representations of music in a unified
manner by generating an accurate mapping between them, lending itself
applicable to a myriad of domains like music education, performance analysis,
automatic accompaniment and music editing. Traditional synchronisation methods
compute alignment using knowledge-driven and stochastic approaches, typically
employing handcrafted features. These methods are often unable to generalise
well to different instruments, acoustic environments and recording conditions,
and normally assume complete structural agreement between the performances and
the scores. This PhD furthers the development of performance-score
synchronisation research by proposing data-driven, context-aware alignment
approaches, on three fronts: Firstly, I replace the handcrafted features by
employing a metric learning based approach that is adaptable to different
acoustic settings and performs well in data-scarce conditions. Secondly, I
address the handling of structural differences between the performances and
scores, which is a common limitation of standard alignment methods. Finally, I
eschew the reliance on both feature engineering and dynamic programming, and
propose a completely data-driven synchronisation method that computes
alignments using a neural framework, whilst also being robust to structural
differences between the performances and scores.
Related papers
- End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding [4.604877755214193]
Existing end-to-end piano A2S systems have been trained and evaluated with only synthetic data.
We propose a sequence-to-sequence (Seq2Seq) model with a hierarchical decoder that aligns with the hierarchical structure of musical scores.
We propose a two-stage training scheme, which involves pre-training the model using an expressive performance rendering system on synthetic audio, followed by fine-tuning the model using recordings of human performance.
arXiv Detail & Related papers (2024-05-22T10:52:04Z) - DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment [67.10208647482109]
The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings.
This paper proposes AlignSTS, an STS model based on explicit cross-modal alignment.
Experiments show that AlignSTS achieves superior performance in terms of both objective and subjective metrics.
arXiv Detail & Related papers (2023-05-08T06:02:10Z) - A Convolutional-Attentional Neural Framework for Structure-Aware
Performance-Score Synchronization [12.951369232106178]
Performance-score synchronization is an integral task in signal processing.
Traditional synchronization methods compute alignment using knowledge-driven approaches.
We present a novel data-driven method for structure-score synchronization.
arXiv Detail & Related papers (2022-04-19T11:41:21Z) - Structure-Aware Audio-to-Score Alignment using Progressively Dilated
Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment.
We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z) - A framework to compare music generative models using automatic
evaluation metrics extended to rhythm [69.2737664640826]
This paper takes the framework proposed in a previous research that did not consider rhythm to make a series of design decisions, then, rhythm support is added to evaluate the performance of two RNN memory cells in the creation of monophonic music.
The model considers the handling of music transposition and the framework evaluates the quality of the generated pieces using automatic quantitative metrics based on geometry which have rhythm support added as well.
arXiv Detail & Related papers (2021-01-19T15:04:46Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Learning Frame Similarity using Siamese networks for Audio-to-Score
Alignment [13.269759433551478]
We propose a method to overcome the limitation using learned frame similarity for audio-to-score alignment.
We focus on offline audio-to-score alignment of piano music.
arXiv Detail & Related papers (2020-11-15T14:58:03Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.