Acoustic Scene Classification with Squeeze-Excitation Residual Networks
- URL: http://arxiv.org/abs/2003.09284v3
- Date: Fri, 26 Jun 2020 09:10:47 GMT
- Title: Acoustic Scene Classification with Squeeze-Excitation Residual Networks
- Authors: Javier Naranjo-Alcazar, Sergi Perez-Castanos, Pedro Zuccarello and
Maximo Cobos
- Abstract summary: We propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning.
The behavior of the block that implements such operators and, therefore, the entire neural network, can be modified depending on the input to the block.
- Score: 4.591851728010269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acoustic scene classification (ASC) is a problem related to the field of
machine listening whose objective is to classify/tag an audio clip in a
predefined label describing a scene location (e. g. park, airport, etc.). Many
state-of-the-art solutions to ASC incorporate data augmentation techniques and
model ensembles. However, considerable improvements can also be achieved only
by modifying the architecture of convolutional neural networks (CNNs). In this
work we propose two novel squeeze-excitation blocks to improve the accuracy of
a CNN-based ASC framework based on residual learning. The main idea of
squeeze-excitation blocks is to learn spatial and channel-wise feature maps
independently instead of jointly as standard CNNs do. This is usually achieved
by some global grouping operators, linear operators and a final calibration
between the input of the block and its obtained relationships. The behavior of
the block that implements such operators and, therefore, the entire neural
network, can be modified depending on the input to the block, the established
residual configurations and the selected non-linear activations. The analysis
has been carried out using the TAU Urban Acoustic Scenes 2019 dataset
(https://zenodo.org/record/2589280) presented in the 2019 edition of the DCASE
challenge. All configurations discussed in this document exceed the performance
of the baseline proposed by the DCASE organization by 13\% percentage points.
In turn, the novel configurations proposed in this paper outperform the
residual configurations proposed in previous works.
Related papers
- Condition-Invariant Semantic Segmentation [77.10045325743644]
We implement Condition-Invariant Semantic (CISS) on the current state-of-the-art domain adaptation architecture.
Our method achieves the second-best performance on the normal-to-adverse Cityscapes$to$ACDC benchmark.
CISS is shown to generalize well to domains unseen during training, such as BDD100K-night and ACDC-night.
arXiv Detail & Related papers (2023-05-27T03:05:07Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Deep Neural Decision Forest for Acoustic Scene Classification [45.886356124352226]
Acoustic scene classification (ASC) aims to classify an audio clip based on the characteristic of the recording environment.
We propose a novel approach for ASC using deep neural decision forest (DNDF)
arXiv Detail & Related papers (2022-03-07T14:39:42Z) - Towards Disentangling Information Paths with Coded ResNeXt [11.884259630414515]
We take a novel approach to enhance the transparency of the function of the whole network.
We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
arXiv Detail & Related papers (2022-02-10T21:45:49Z) - Exploring Novel Pooling Strategies for Edge Preserved Feature Maps in
Convolutional Neural Networks [0.0]
Anti-aliased convolutional neural networks (CNNs) have led to some resurgence in relooking the way pooling is done in CNNs.
Two novel pooling approaches are presented such as Laplacian-Gaussian Concatenation with Attention (LGCA) pooling and Wavelet based approximate-detailed concatenation with attention (WADCA) pooling.
Results suggest that the proposed pooling approaches outperform the conventional pooling as well as blur pooling for classification, segmentation and autoencoders.
arXiv Detail & Related papers (2021-10-17T15:11:51Z) - Robust Feature Learning on Long-Duration Sounds for Acoustic Scene
Classification [54.57150493905063]
Acoustic scene classification (ASC) aims to identify the type of scene (environment) in which a given audio signal is recorded.
We propose a robust feature learning (RFL) framework to train the CNN.
arXiv Detail & Related papers (2021-08-11T03:33:05Z) - SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal
Action Detection [32.159784061961886]
Temporal action detection (TAD) is a challenging task which aims to temporally localize and recognize the human action in untrimmed videos.
Current mainstream one-stage TAD approaches localize and classify action proposals relying on pre-defined anchors.
A novel TAD model termed as Selective Receptive Field Network (SRF-Net) is developed.
arXiv Detail & Related papers (2021-06-29T11:29:16Z) - A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness.
Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set.
Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - Cross-scale Attention Model for Acoustic Event Classification [45.15898265162008]
We propose a cross-scale attention (CSA) model, which explicitly integrates features from different scales to form the final representation.
We show that the proposed CSA model can effectively improve the performance of current state-of-the-art deep learning algorithms.
arXiv Detail & Related papers (2019-12-27T07:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.