Related papers: Pitch-Informed Instrument Assignment Using a Deep Convolutional Network with Multiple Kernel Shapes

Pitch-Informed Instrument Assignment Using a Deep Convolutional Network with Multiple Kernel Shapes

URL: http://arxiv.org/abs/2107.13617v1
Date: Wed, 28 Jul 2021 19:48:09 GMT
Title: Pitch-Informed Instrument Assignment Using a Deep Convolutional Network with Multiple Kernel Shapes
Authors: Carlos Lordelo, Emmanouil Benetos, Simon Dixon and Sven Ahlb\"ack
Abstract summary: This paper proposes a deep convolutional neural network for performing note-level instrument assignment. Experiments on the MusicNet dataset using 7 instrument classes show that our approach is able to achieve an average F-score of 0.904.
Score: 22.14133334414372
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper proposes a deep convolutional neural network for performing note-level instrument assignment. Given a polyphonic multi-instrumental music signal along with its ground truth or predicted notes, the objective is to assign an instrumental source for each note. This problem is addressed as a pitch-informed classification task where each note is analysed individually. We also propose to utilise several kernel shapes in the convolutional layers in order to facilitate learning of efficient timbre-discriminative feature maps. Experiments on the MusicNet dataset using 7 instrument classes show that our approach is able to achieve an average F-score of 0.904 when the original multi-pitch annotations are used as the pitch information for the system, and that it also excels if the note information is provided using third-party multi-pitch estimation algorithms. We also include ablation studies investigating the effects of the use of multiple kernel shapes and comparing different input representations for the audio and the note-related information.

Related papers

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music [43.17623332544677]
TONet is a plug-and-play model that improves both tone and octave perceptions. We present an improved input representation, the Tone-CFP, that explicitly groups harmonics. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map.
arXiv Detail & Related papers (2022-02-02T10:55:48Z)
Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified. The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z)
Timbre Classification of Musical Instruments with a Deep Learning Multi-Head Attention-Based Model [1.7188280334580197]
The aim of this work is to define a model that is able to identify different instrument timbres with as few parameters as possible. It has been possible to assess the ability to classify instruments by timbre even if the instruments are playing the same note with the same intensity.
arXiv Detail & Related papers (2021-07-13T16:34:19Z)
Fast accuracy estimation of deep learning based multi-class musical source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network. Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z)
Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision. The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr. This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z)
Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural Networks [7.088324036549911]
This paper addresses the extraction of multiple F0 values from polyphonic and a cappella vocal performances using convolutional neural networks (CNNs) We build upon an existing architecture to produce a pitch salience function of the input signal. For training, we build a dataset that comprises several multi-track datasets of vocal quartets with F0 annotations.
arXiv Detail & Related papers (2020-09-09T09:11:49Z)
Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated. We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains. This will also provide a means for evaluating algorithms specifically designed for music. The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)
Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning. Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness. We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.