Receptive Field Regularization Techniques for Audio Classification and
Tagging with Deep Convolutional Neural Networks
- URL: http://arxiv.org/abs/2105.12395v1
- Date: Wed, 26 May 2021 08:36:29 GMT
- Title: Receptive Field Regularization Techniques for Audio Classification and
Tagging with Deep Convolutional Neural Networks
- Authors: Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer
- Abstract summary: We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization.
We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures.
- Score: 7.9495796547433395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the performance of variants of well-known
Convolutional Neural Network (CNN) architectures on different audio tasks. We
show that tuning the Receptive Field (RF) of CNNs is crucial to their
generalization. An insufficient RF limits the CNN's ability to fit the training
data. In contrast, CNNs with an excessive RF tend to over-fit the training data
and fail to generalize to unseen testing data. As state-of-the-art CNN
architectures-in computer vision and other domains-tend to go deeper in terms
of number of layers, their RF size increases and therefore they degrade in
performance in several audio classification and tagging tasks. We study
well-known CNN architectures and how their building blocks affect their
receptive field. We propose several systematic approaches to control the RF of
CNNs and systematically test the resulting architectures on different audio
classification and tagging tasks and datasets. The experiments show that
regularizing the RF of CNNs using our proposed approaches can drastically
improve the generalization of models, out-performing complex architectures and
pre-trained models on larger datasets. The proposed CNNs achieve
state-of-the-art results in multiple tasks, from acoustic scene classification
to emotion and theme detection in music to instrument recognition, as
demonstrated by top ranks in several pertinent challenges (DCASE, MediaEval).
Related papers
- Transferability of Convolutional Neural Networks in Stationary Learning
Tasks [96.00428692404354]
We introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems.
We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining.
Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten.
arXiv Detail & Related papers (2023-07-21T13:51:45Z) - SAR Despeckling Using Overcomplete Convolutional Networks [53.99620005035804]
despeckling is an important problem in remote sensing as speckle degrades SAR images.
Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods.
This study employs an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field.
We show that the proposed network improves despeckling performance compared to recent despeckling methods on synthetic and real SAR images.
arXiv Detail & Related papers (2022-05-31T15:55:37Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - On the Performance of Convolutional Neural Networks under High and Low
Frequency Information [13.778851745408133]
We study the performance of CNN models over the high and low frequency information of the images.
We propose the filtering based data augmentation during training.
A satisfactory performance improvement has been observed in terms of robustness and low frequency generalization.
arXiv Detail & Related papers (2020-10-30T17:54:45Z) - Receptive-Field Regularized CNNs for Music Classification and Tagging [8.188197619481466]
We present a principled way to make deep architectures like ResNet competitive for music-related tasks, based on well-designed regularization strategies.
In particular, we analyze the recently introduced Receptive-Field Regularization and Shake-Shake, and show that they significantly improve the generalization of deep CNNs on music-related tasks.
arXiv Detail & Related papers (2020-07-27T12:48:12Z) - Exploring Deep Hybrid Tensor-to-Vector Network Architectures for
Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size.
CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z) - Convolution Neural Network Architecture Learning for Remote Sensing
Scene Classification [22.29957803992306]
This paper proposes an automatically architecture learning procedure for remote sensing scene classification.
We introduce a learning strategy which can allow efficient search in the architecture space by means of gradient descent.
An architecture generator finally maps the set of parameters into the CNN used in our experiments.
arXiv Detail & Related papers (2020-01-27T07:42:46Z) - Inferring Convolutional Neural Networks' accuracies from their
architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance.
We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems.
We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.