Leveraging Hierarchical Structures for Few-Shot Musical Instrument
Recognition
- URL: http://arxiv.org/abs/2107.07029v1
- Date: Wed, 14 Jul 2021 22:50:24 GMT
- Title: Leveraging Hierarchical Structures for Few-Shot Musical Instrument
Recognition
- Authors: Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Bryan Pardo
- Abstract summary: We exploit hierarchical relationships between instruments in a few-shot learning setup to enable classification of a wider set of musical instruments.
Compared to a non-hierarchical few-shot baseline, our method leads to a significant increase in classification accuracy and significant decrease mistake severity on instrument classes unseen in training.
- Score: 9.768677073327423
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning work on musical instrument recognition has generally focused on
instrument classes for which we have abundant data. In this work, we exploit
hierarchical relationships between instruments in a few-shot learning setup to
enable classification of a wider set of musical instruments, given a few
examples at inference. We apply a hierarchical loss function to the training of
prototypical networks, combined with a method to aggregate prototypes
hierarchically, mirroring the structure of a predefined musical instrument
hierarchy. These extensions require no changes to the network architecture and
new levels can be easily added or removed. Compared to a non-hierarchical
few-shot baseline, our method leads to a significant increase in classification
accuracy and significant decrease mistake severity on instrument classes unseen
in training.
Related papers
- Phononic materials with effectively scale-separated hierarchical features using interpretable machine learning [57.91994916297646]
Architected hierarchical phononic materials have sparked promise tunability of elastodynamic waves and vibrations over multiple frequency ranges.
In this article, hierarchical unit-cells are obtained, where features at each length scale result in a band gap within a targeted frequency range.
Our approach offers a flexible and efficient method for the exploration of new regions in the hierarchical design space.
arXiv Detail & Related papers (2024-08-15T21:35:06Z) - Classification and Reconstruction Processes in Deep Predictive Coding
Networks: Antagonists or Allies? [0.0]
Predictive coding-inspired deep networks for visual computing integrate classification and reconstruction processes in shared intermediate layers.
We take a critical look at how classifying and reconstructing interact in deep learning architectures.
Our findings underscore a significant challenge: Classification-driven information diminishes reconstruction-driven information in intermediate layers' shared representations.
arXiv Detail & Related papers (2024-01-17T14:34:32Z) - Structural Concept Learning via Graph Attention for Multi-Level
Rearrangement Planning [2.7195102129095003]
We propose a deep learning approach to perform multi-level object rearrangement planning for scenes with structural dependency hierarchies.
It is trained on a self-generated simulation data set with intuitive structures and works for unseen scenes with an arbitrary number of objects.
We compare our method with a range of classical and model-based baselines to show that our method leverages its scene understanding to achieve better performance, flexibility, and efficiency.
arXiv Detail & Related papers (2023-09-05T19:35:44Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Automatic Modulation Classification with Deep Neural Networks [0.0]
We show that a combination of dilated convolutions, statistics pooling, and squeeze-and-excitation units results in the strongest performing classifier.
We further investigate this best performer according to various other criteria, including short signal bursts, common misclassifications, and performance across differing modulation categories and modes.
arXiv Detail & Related papers (2023-01-27T15:16:06Z) - Bi-directional Feature Reconstruction Network for Fine-Grained Few-Shot
Image Classification [61.411869453639845]
We introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations.
This design effectively helps the model to explore more subtle and discriminative features.
Experimental results on three widely used fine-grained image classification datasets consistently show considerable improvements.
arXiv Detail & Related papers (2022-11-30T16:55:14Z) - Use All The Labels: A Hierarchical Multi-Label Contrastive Learning
Framework [75.79736930414715]
We present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.
We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint.
arXiv Detail & Related papers (2022-04-27T21:41:44Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - Timbre Classification of Musical Instruments with a Deep Learning
Multi-Head Attention-Based Model [1.7188280334580197]
The aim of this work is to define a model that is able to identify different instrument timbres with as few parameters as possible.
It has been possible to assess the ability to classify instruments by timbre even if the instruments are playing the same note with the same intensity.
arXiv Detail & Related papers (2021-07-13T16:34:19Z) - MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and
Architectures [61.73533544385352]
We propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data.
As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize heterogeneous tasks and architectures.
arXiv Detail & Related papers (2020-06-13T02:54:59Z) - Learn Class Hierarchy using Convolutional Neural Networks [0.9569316316728905]
We propose a new architecture for hierarchical classification of images, introducing a stack of deep linear layers with cross-entropy loss functions and center loss combined.
We experimentally show that our hierarchical classifier presents advantages to the traditional classification approaches finding application in computer vision tasks.
arXiv Detail & Related papers (2020-05-18T12:06:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.