Timbre Classification of Musical Instruments with a Deep Learning
Multi-Head Attention-Based Model
- URL: http://arxiv.org/abs/2107.06231v1
- Date: Tue, 13 Jul 2021 16:34:19 GMT
- Title: Timbre Classification of Musical Instruments with a Deep Learning
Multi-Head Attention-Based Model
- Authors: Carlos Hernandez-Olivan, Jose R. Beltran
- Abstract summary: The aim of this work is to define a model that is able to identify different instrument timbres with as few parameters as possible.
It has been possible to assess the ability to classify instruments by timbre even if the instruments are playing the same note with the same intensity.
- Score: 1.7188280334580197
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The aim of this work is to define a model based on deep learning that is able
to identify different instrument timbres with as few parameters as possible.
For this purpose, we have worked with classical orchestral instruments played
with different dynamics, which are part of a few instrument families and which
play notes in the same pitch range. It has been possible to assess the ability
to classify instruments by timbre even if the instruments are playing the same
note with the same intensity. The network employed uses a multi-head attention
mechanism, with 8 heads and a dense network at the output taking as input the
log-mel magnitude spectrograms of the sound samples. This network allows the
identification of 20 instrument classes of the classical orchestra, achieving
an overall F$_1$ value of 0.62. An analysis of the weights of the attention
layer has been performed and the confusion matrix of the model is presented,
allowing us to assess the ability of the proposed architecture to distinguish
timbre and to establish the aspects on which future work should focus.
Related papers
- MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - Pitch-Informed Instrument Assignment Using a Deep Convolutional Network
with Multiple Kernel Shapes [22.14133334414372]
This paper proposes a deep convolutional neural network for performing note-level instrument assignment.
Experiments on the MusicNet dataset using 7 instrument classes show that our approach is able to achieve an average F-score of 0.904.
arXiv Detail & Related papers (2021-07-28T19:48:09Z) - Leveraging Hierarchical Structures for Few-Shot Musical Instrument
Recognition [9.768677073327423]
We exploit hierarchical relationships between instruments in a few-shot learning setup to enable classification of a wider set of musical instruments.
Compared to a non-hierarchical few-shot baseline, our method leads to a significant increase in classification accuracy and significant decrease mistake severity on instrument classes unseen in training.
arXiv Detail & Related papers (2021-07-14T22:50:24Z) - Towards Automatic Instrumentation by Learning to Separate Parts in
Symbolic Multitrack Music [33.679951600368405]
We study the feasibility of automatic instrumentation -- dynamically assigning instruments to notes in solo music during performance.
In addition to the online, real-time-capable setting for performative use cases, automatic instrumentation can also find applications in assistive composing tools in an offline setting.
We frame the task of part separation as a sequential multi-class classification problem and adopt machine learning to map sequences of notes into sequences of part labels.
arXiv Detail & Related papers (2021-07-13T08:34:44Z) - Deep Neural Network for Musical Instrument Recognition using MFCCs [0.6445605125467573]
Musical instrument recognition is the task of instrument identification by virtue of its audio.
In this paper, we use an artificial neural network (ANN) model that was trained to perform classification on twenty different classes of musical instruments.
arXiv Detail & Related papers (2021-05-03T15:10:34Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z) - Visual Attention for Musical Instrument Recognition [72.05116221011949]
We explore the use of attention mechanism in a timbral-temporal sense, a la visual attention, to improve the performance of musical instrument recognition.
The first approach applies attention mechanism to the sliding-window paradigm, where a prediction based on each timbral-temporal instance' is given an attention weight, before aggregation to produce the final prediction.
The second approach is based on a recurrent model of visual attention where the network only attends to parts of the spectrogram and decide where to attend to next, given a limited number of glimpses'
arXiv Detail & Related papers (2020-06-17T03:56:44Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.