Online Symbolic Music Alignment with Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2401.00466v1
- Date: Sun, 31 Dec 2023 11:42:42 GMT
- Title: Online Symbolic Music Alignment with Offline Reinforcement Learning
- Authors: Silvan David Peter
- Abstract summary: Symbolic Music Alignment is the process of matching performed MIDI notes to corresponding score notes.
In this paper, we introduce a reinforcement learning-based online symbolic music alignment technique.
The proposed model outperforms a state-of-the-art reference model of offline symbolic music alignment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbolic Music Alignment is the process of matching performed MIDI notes to
corresponding score notes. In this paper, we introduce a reinforcement learning
(RL)-based online symbolic music alignment technique. The RL agent - an
attention-based neural network - iteratively estimates the current score
position from local score and performance contexts. For this symbolic alignment
task, environment states can be sampled exhaustively and the reward is dense,
rendering a formulation as a simplified offline RL problem straightforward. We
evaluate the trained agent in three ways. First, in its capacity to identify
correct score positions for sampled test contexts; second, as the core
technique of a complete algorithm for symbolic online note-wise alignment; and
finally, as a real-time symbolic score follower. We further investigate the
pitch-based score and performance representations used as the agent's inputs.
To this end, we develop a second model, a two-step Dynamic Time Warping
(DTW)-based offline alignment algorithm leveraging the same input
representation. The proposed model outperforms a state-of-the-art reference
model of offline symbolic music alignment.
Related papers
- Just Label the Repeats for In-The-Wild Audio-to-Score Alignment [7.7805314458791806]
We propose an efficient workflow for alignment of in-the-wild performance audio and corresponding sheet music scans (images)
We show that our proposed jump annotation workflow and improved feature representations together improve alignment accuracy by 150% relative to prior work.
arXiv Detail & Related papers (2024-11-11T23:05:02Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment [67.10208647482109]
The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings.
This paper proposes AlignSTS, an STS model based on explicit cross-modal alignment.
Experiments show that AlignSTS achieves superior performance in terms of both objective and subjective metrics.
arXiv Detail & Related papers (2023-05-08T06:02:10Z) - Streaming Audio-Visual Speech Recognition with Alignment Regularization [69.30185151873707]
We propose a streaming AV-ASR system based on a hybrid connectionist temporal classification ( CTC)/attention neural network architecture.
The proposed AV-ASR model achieves WERs of 2.0% and 2.6% on the Lip Reading Sentences 3 dataset in an offline and online setup.
arXiv Detail & Related papers (2022-11-03T20:20:47Z) - Hybrid Routing Transformer for Zero-Shot Learning [83.64532548391]
This paper presents a novel transformer encoder-decoder model, called hybrid routing transformer (HRT)
We embed an active attention, which is constructed by both the bottom-up and the top-down dynamic routing pathways to generate the attribute-aligned visual feature.
While in HRT decoder, we use static routing to calculate the correlation among the attribute-aligned visual features, the corresponding attribute semantics, and the class attribute vectors to generate the final class label predictions.
arXiv Detail & Related papers (2022-03-29T07:55:08Z) - Exploring single-song autoencoding schemes for audio-based music
structure analysis [6.037383467521294]
This work explores a "piece-specific" autoencoding scheme, in which a low-dimensional autoencoder is trained to learn a latent/compressed representation specific to a given song.
We report that the proposed unsupervised auto-encoding scheme achieves the level of performance of supervised state-of-the-art methods with 3 seconds tolerance.
arXiv Detail & Related papers (2021-10-27T13:48:25Z) - Multi-modal Conditional Bounding Box Regression for Music Score
Following [7.360807642941713]
This paper addresses the problem of sheet-image-based on-line audio-to-score alignment also known as score following.
A conditional neural network architecture is proposed that directly predicts x,y coordinates of the matching positions in a complete score sheet image at each point in time for a given musical performance.
arXiv Detail & Related papers (2021-05-10T12:43:35Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - A Hybrid Approach to Audio-to-Score Alignment [13.269759433551478]
Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece.
Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features.
We explore the usage of neural networks as a preprocessing step for DTW-based automatic alignment methods.
arXiv Detail & Related papers (2020-07-28T16:04:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.