Related papers: Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks

Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks

URL: http://arxiv.org/abs/2105.12395v1
Date: Wed, 26 May 2021 08:36:29 GMT
Title: Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks
Authors: Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer
Abstract summary: We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization. We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures.
Score: 7.9495796547433395
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the performance of variants of well-known Convolutional Neural Network (CNN) architectures on different audio tasks. We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization. An insufficient RF limits the CNN's ability to fit the training data. In contrast, CNNs with an excessive RF tend to over-fit the training data and fail to generalize to unseen testing data. As state-of-the-art CNN architectures-in computer vision and other domains-tend to go deeper in terms of number of layers, their RF size increases and therefore they degrade in performance in several audio classification and tagging tasks. We study well-known CNN architectures and how their building blocks affect their receptive field. We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures on different audio classification and tagging tasks and datasets. The experiments show that regularizing the RF of CNNs using our proposed approaches can drastically improve the generalization of models, out-performing complex architectures and pre-trained models on larger datasets. The proposed CNNs achieve state-of-the-art results in multiple tasks, from acoustic scene classification to emotion and theme detection in music to instrument recognition, as demonstrated by top ranks in several pertinent challenges (DCASE, MediaEval).

Related papers

Enhanced Convolutional Neural Networks for Improved Image Classification [0.40964539027092917]
CIFAR-10 is a widely used benchmark to evaluate the performance of classification models on small-scale, multi-class datasets. We propose an enhanced CNN architecture that integrates deeper convolutional blocks, batch normalization, and dropout regularization to achieve superior performance.
arXiv Detail & Related papers (2025-02-02T04:32:25Z)
Transferability of Convolutional Neural Networks in Stationary Learning Tasks [96.00428692404354]
We introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining. Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten.
arXiv Detail & Related papers (2023-07-21T13:51:45Z)
SAR Despeckling Using Overcomplete Convolutional Networks [53.99620005035804]
despeckling is an important problem in remote sensing as speckle degrades SAR images. Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods. This study employs an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field. We show that the proposed network improves despeckling performance compared to recent despeckling methods on synthetic and real SAR images.
arXiv Detail & Related papers (2022-05-31T15:55:37Z)
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks. Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z)
The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer. Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z)
On the Performance of Convolutional Neural Networks under High and Low Frequency Information [13.778851745408133]
We study the performance of CNN models over the high and low frequency information of the images. We propose the filtering based data augmentation during training. A satisfactory performance improvement has been observed in terms of robustness and low frequency generalization.
arXiv Detail & Related papers (2020-10-30T17:54:45Z)
Receptive-Field Regularized CNNs for Music Classification and Tagging [8.188197619481466]
We present a principled way to make deep architectures like ResNet competitive for music-related tasks, based on well-designed regularization strategies. In particular, we analyze the recently introduced Receptive-Field Regularization and Shake-Shake, and show that they significantly improve the generalization of deep CNNs on music-related tasks.
arXiv Detail & Related papers (2020-07-27T12:48:12Z)
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z)
Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation. We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters. As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
Convolution Neural Network Architecture Learning for Remote Sensing Scene Classification [22.29957803992306]
This paper proposes an automatically architecture learning procedure for remote sensing scene classification. We introduce a learning strategy which can allow efficient search in the architecture space by means of gradient descent. An architecture generator finally maps the set of parameters into the CNN used in our experiments.
arXiv Detail & Related papers (2020-01-27T07:42:46Z)
Inferring Convolutional Neural Networks' accuracies from their architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance. We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems. We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.