Global Structure-Aware Drum Transcription Based on Self-Attention
Mechanisms
- URL: http://arxiv.org/abs/2105.05791v1
- Date: Wed, 12 May 2021 17:04:16 GMT
- Title: Global Structure-Aware Drum Transcription Based on Self-Attention
Mechanisms
- Authors: Ryoto Ishizuka, Ryo Nishikimi, Kazuyoshi Yoshii
- Abstract summary: This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal.
To capture the global repetitive structure of drum scores, we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder.
Experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure.
- Score: 18.5148472561169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes an automatic drum transcription (ADT) method that
directly estimates a tatum-level drum score from a music signal, in contrast to
most conventional ADT methods that estimate the frame-level onset probabilities
of drums. To estimate a tatum-level score, we propose a deep transcription
model that consists of a frame-level encoder for extracting the latent features
from a music signal and a tatum-level decoder for estimating a drum score from
the latent features pooled at the tatum level. To capture the global repetitive
structure of drum scores, which is difficult to learn with a recurrent neural
network (RNN), we introduce a self-attention mechanism with tatum-synchronous
positional encoding into the decoder. To mitigate the difficulty of training
the self-attention-based model from an insufficient amount of paired data and
improve the musical naturalness of the estimated scores, we propose a
regularized training method that uses a global structure-aware masked language
(score) model with a self-attention mechanism pretrained from an extensive
collection of drum scores. Experimental results showed that the proposed
regularized model outperformed the conventional RNN-based model in terms of the
tatum-level error rate and the frame-level F-measure, even when only a limited
amount of paired data was available so that the non-regularized model
underperformed the RNN-based model.
Related papers
- Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Dynamic Scheduled Sampling with Imitation Loss for Neural Text
Generation [10.306522595622651]
We introduce Dynamic Scheduled Sampling with Imitation Loss (DySI), which maintains the schedule based solely on the training time accuracy.
DySI achieves notable improvements on standard machine translation benchmarks, and significantly improves the robustness of other text generation models.
arXiv Detail & Related papers (2023-01-31T16:41:06Z) - Low-Resource Music Genre Classification with Cross-Modal Neural Model
Reprogramming [129.4950757742912]
We introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR)
NMR aims at re-purposing a pre-trained model from a source domain to a target domain by modifying the input of a frozen pre-trained model.
Experimental results suggest that a neural model pre-trained on large-scale datasets can successfully perform music genre classification by using this reprogramming method.
arXiv Detail & Related papers (2022-11-02T17:38:33Z) - Self-Contrastive Learning based Semi-Supervised Radio Modulation
Classification [6.089994098441994]
This paper presents a semi-supervised learning framework for automatic modulation classification (AMC)
By carefully utilizing unlabeled signal data with a self-supervised contrastive-learning pre-training step, our framework achieves higher performance given smaller amounts of labeled data.
We evaluate the performance of our semi-supervised framework on a public dataset.
arXiv Detail & Related papers (2022-03-29T22:21:14Z) - Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn.
We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural
Network with Language Model-Based Regularized Training [20.69310034107256]
This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the $textittatum$ level.
arXiv Detail & Related papers (2020-10-08T03:47:25Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - NAT: Noise-Aware Training for Robust Neural Sequence Labeling [30.91638109413785]
We propose two Noise-Aware Training (NAT) objectives that improve robustness of sequence labeling performed on input.
Our data augmentation method trains a neural model using a mixture of clean and noisy samples, whereas our stability training algorithm encourages the model to create a noise-invariant latent representation.
Experiments on English and German named entity recognition benchmarks confirmed that NAT consistently improved robustness of popular sequence labeling models.
arXiv Detail & Related papers (2020-05-14T17:30:06Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.