Related papers: Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks

Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks

URL: http://arxiv.org/abs/2107.00623v1
Date: Thu, 1 Jul 2021 17:21:02 GMT
Title: Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks
Authors: Eduardo Fonseca, Andres Ferraro, Xavier Serra
Abstract summary: Recent studies have put into question the commonly assumed shift invariance property of convolutional networks. We evaluate two methods to improve shift invariance in CNNs, based on low-pass filtering and adaptive sampling of incoming feature maps. We show that these modifications consistently improve sound event classification in all cases considered, without adding any (or adding very few) trainable parameters.
Score: 14.236193187116047
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies have put into question the commonly assumed shift invariance property of convolutional networks, showing that small shifts in the input can affect the output predictions substantially. In this paper, we ask whether lack of shift invariance is a problem in sound event classification, and whether there are benefits in addressing it. Specifically, we evaluate two pooling methods to improve shift invariance in CNNs, based on low-pass filtering and adaptive sampling of incoming feature maps. These methods are implemented via small architectural modifications inserted into the pooling layers of CNNs. We evaluate the effect of these architectural changes on the FSD50K dataset using models of different capacity and in presence of strong regularization. We show that these modifications consistently improve sound event classification in all cases considered, without adding any (or adding very few) trainable parameters, which makes them an appealing alternative to conventional pooling layers. The outcome is a new state-of-the-art mAP of 0.541 on the FSD50K classification benchmark.

Related papers

Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling [14.731788603429774]
Downsampling operators break the shift invariance of convolutional neural networks (CNNs) We propose a learnable pooling operator called Translation Invariant Polyphase Sampling (TIPS) TIPS results in consistent performance gains in terms of accuracy, shift consistency, and shift fidelity.
arXiv Detail & Related papers (2024-04-11T00:49:38Z)
Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains [23.10912424714101]
Recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF.
arXiv Detail & Related papers (2024-02-28T15:52:30Z)
Balanced Classification: A Unified Framework for Long-Tailed Object Detection [74.94216414011326]
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories. We introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution. BACL consistently achieves performance improvements across various datasets with different backbones and architectures.
arXiv Detail & Related papers (2023-08-04T09:11:07Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Fuzzy Pooling [7.6146285961466]
Convolutional Neural Networks (CNNs) are artificial learning systems typically based on two operations: convolution and pooling. We present a novel pooling operation based on (type-1) fuzzy sets to cope with the local imprecision of the feature maps. Experiments using publicly available datasets show that the proposed approach can enhance the classification performance of a CNN.
arXiv Detail & Related papers (2022-02-12T11:18:32Z)
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet. We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
ECINN: Efficient Counterfactuals from Invertible Neural Networks [80.94500245955591]
We propose a method, ECINN, that utilizes the generative capacities of invertible neural networks for image classification to generate counterfactual examples efficiently. ECINN has a closed-form expression and generates a counterfactual in the time of only two evaluations. Our experiments demonstrate how ECINN alters class-dependent image regions to change the perceptual and predicted class of the counterfactuals.
arXiv Detail & Related papers (2021-03-25T09:23:24Z)
Truly shift-invariant convolutional neural networks [0.0]
Recent works have shown that the output of a CNN can change significantly with small shifts in input. We propose adaptive polyphase sampling (APS), a simple sub-sampling scheme that allows convolutional neural networks to achieve 100% consistency in classification performance under shifts.
arXiv Detail & Related papers (2020-11-28T20:57:35Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.