Receptive-Field Regularized CNNs for Music Classification and Tagging
- URL: http://arxiv.org/abs/2007.13503v1
- Date: Mon, 27 Jul 2020 12:48:12 GMT
- Title: Receptive-Field Regularized CNNs for Music Classification and Tagging
- Authors: Khaled Koutini, Hamid Eghbal-Zadeh, Verena Haunschmid, Paul Primus,
Shreyan Chowdhury, Gerhard Widmer
- Abstract summary: We present a principled way to make deep architectures like ResNet competitive for music-related tasks, based on well-designed regularization strategies.
In particular, we analyze the recently introduced Receptive-Field Regularization and Shake-Shake, and show that they significantly improve the generalization of deep CNNs on music-related tasks.
- Score: 8.188197619481466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks (CNNs) have been successfully used in various
Music Information Retrieval (MIR) tasks, both as end-to-end models and as
feature extractors for more complex systems. However, the MIR field is still
dominated by the classical VGG-based CNN architecture variants, often in
combination with more complex modules such as attention, and/or techniques such
as pre-training on large datasets. Deeper models such as ResNet -- which
surpassed VGG by a large margin in other domains -- are rarely used in MIR. One
of the main reasons for this, as we will show, is the lack of generalization of
deeper CNNs in the music domain. In this paper, we present a principled way to
make deep architectures like ResNet competitive for music-related tasks, based
on well-designed regularization strategies. In particular, we analyze the
recently introduced Receptive-Field Regularization and Shake-Shake, and show
that they significantly improve the generalization of deep CNNs on
music-related tasks, and that the resulting deep CNNs can outperform current
more complex models such as CNNs augmented with pre-training and attention. We
demonstrate this on two different MIR tasks and two corresponding datasets,
thus offering our deep regularized CNNs as a new baseline for these datasets,
which can also be used as a feature-extracting module in future, more complex
approaches.
Related papers
- Multiway Multislice PHATE: Visualizing Hidden Dynamics of RNNs through Training [6.326396282553267]
Recurrent neural networks (RNNs) are a widely used tool for sequential data analysis, however, they are still often seen as black boxes of computation.
Here, we present Multiway Multislice PHATE (MM-PHATE), a novel method for visualizing the evolution of RNNs' hidden states.
arXiv Detail & Related papers (2024-06-04T05:05:27Z) - CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation.
The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z) - Transferability of Convolutional Neural Networks in Stationary Learning
Tasks [96.00428692404354]
We introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems.
We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining.
Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten.
arXiv Detail & Related papers (2023-07-21T13:51:45Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Towards a General Purpose CNN for Long Range Dependencies in
$\mathrm{N}$D [49.57261544331683]
We propose a single CNN architecture equipped with continuous convolutional kernels for tasks on arbitrary resolution, dimensionality and length without structural changes.
We show the generality of our approach by applying the same CCNN to a wide set of tasks on sequential (1$mathrmD$) and visual data (2$mathrmD$)
Our CCNN performs competitively and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2022-06-07T15:48:02Z) - Receptive Field Regularization Techniques for Audio Classification and
Tagging with Deep Convolutional Neural Networks [7.9495796547433395]
We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization.
We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures.
arXiv Detail & Related papers (2021-05-26T08:36:29Z) - Scene Understanding for Autonomous Driving [0.0]
We study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2.
We observe a significant improvement in performance after fine-tuning these models on the datasets of interest.
We run inference in unusual situations using out of context datasets, and present interesting results.
arXiv Detail & Related papers (2021-05-11T09:50:05Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Disentangling Trainability and Generalization in Deep Neural Networks [45.15453323967438]
We analyze the spectrum of the Neural Tangent Kernel (NTK) for trainability and generalization across a range of networks.
We find that CNNs without global average pooling behave almost identically to FCNs, but that CNNs with pooling have markedly different and often better generalization performance.
arXiv Detail & Related papers (2019-12-30T18:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.