Related papers: Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

URL: http://arxiv.org/abs/2105.05791v1
Date: Wed, 12 May 2021 17:04:16 GMT
Title: Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms
Authors: Ryoto Ishizuka, Ryo Nishikimi, Kazuyoshi Yoshii
Abstract summary: This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal. To capture the global repetitive structure of drum scores, we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. Experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure.
Score: 18.5148472561169
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal, in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music signal and a tatum-level decoder for estimating a drum score from the latent features pooled at the tatum level. To capture the global repetitive structure of drum scores, which is difficult to learn with a recurrent neural network (RNN), we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores. Experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure, even when only a limited amount of paired data was available so that the non-regularized model underperformed the RNN-based model.

Related papers

Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance. Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework. Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z)
Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation [10.306522595622651]
We introduce Dynamic Scheduled Sampling with Imitation Loss (DySI), which maintains the schedule based solely on the training time accuracy. DySI achieves notable improvements on standard machine translation benchmarks, and significantly improves the robustness of other text generation models.
arXiv Detail & Related papers (2023-01-31T16:41:06Z)
Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming [129.4950757742912]
We introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR) NMR aims at re-purposing a pre-trained model from a source domain to a target domain by modifying the input of a frozen pre-trained model. Experimental results suggest that a neural model pre-trained on large-scale datasets can successfully perform music genre classification by using this reprogramming method.
arXiv Detail & Related papers (2022-11-02T17:38:33Z)
Self-Contrastive Learning based Semi-Supervised Radio Modulation Classification [6.089994098441994]
This paper presents a semi-supervised learning framework for automatic modulation classification (AMC) By carefully utilizing unlabeled signal data with a self-supervised contrastive-learning pre-training step, our framework achieves higher performance given smaller amounts of labeled data. We evaluate the performance of our semi-supervised framework on a public dataset.
arXiv Detail & Related papers (2022-03-29T22:21:14Z)
Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn. We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training [20.69310034107256]
This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the $textittatum$ level.
arXiv Detail & Related papers (2020-10-08T03:47:25Z)
Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated. We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
NAT: Noise-Aware Training for Robust Neural Sequence Labeling [30.91638109413785]
We propose two Noise-Aware Training (NAT) objectives that improve robustness of sequence labeling performed on input. Our data augmentation method trains a neural model using a mixture of clean and noisy samples, whereas our stability training algorithm encourages the model to create a noise-invariant latent representation. Experiments on English and German named entity recognition benchmarks confirmed that NAT consistently improved robustness of popular sequence labeling models.
arXiv Detail & Related papers (2020-05-14T17:30:06Z)
Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words" Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.