Cross-scale Attention Model for Acoustic Event Classification
- URL: http://arxiv.org/abs/1912.12011v2
- Date: Mon, 15 Jun 2020 21:10:38 GMT
- Title: Cross-scale Attention Model for Acoustic Event Classification
- Authors: Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai
- Abstract summary: We propose a cross-scale attention (CSA) model, which explicitly integrates features from different scales to form the final representation.
We show that the proposed CSA model can effectively improve the performance of current state-of-the-art deep learning algorithms.
- Score: 45.15898265162008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major advantage of a deep convolutional neural network (CNN) is that the
focused receptive field size is increased by stacking multiple convolutional
layers. Accordingly, the model can explore the long-range dependency of
features from the top layers. However, a potential limitation of the network is
that the discriminative features from the bottom layers (which can model the
short-range dependency) are smoothed out in the final representation. This
limitation is especially evident in the acoustic event classification (AEC)
task, where both short- and long-duration events are involved in an audio clip
and needed to be classified. In this paper, we propose a cross-scale attention
(CSA) model, which explicitly integrates features from different scales to form
the final representation. Moreover, we propose the adoption of the attention
mechanism to specify the weights of local and global features based on the
spatial and temporal characteristics of acoustic events. Using mathematic
formulations, we further reveal that the proposed CSA model can be regarded as
a weighted residual CNN (ResCNN) model when the ResCNN is used as a backbone
model. We tested the proposed model on two AEC datasets: one is an urban AEC
task, and the other is an AEC task in smart car environments. Experimental
results show that the proposed CSA model can effectively improve the
performance of current state-of-the-art deep learning algorithms.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning [38.09011520275557]
Class-incremental learning (CIL) aims to train a model to learn new classes from non-stationary data streams without forgetting old ones.
We propose a new kind of connectionist model by tailoring neural unit dynamics that adapt the behavior of neural networks for CIL.
arXiv Detail & Related papers (2024-06-04T15:47:03Z) - CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks [19.468704622654357]
We present a channel-wise spatially autocorrelated (CSA) attention mechanism for deep CNNs.
Inspired by geographical analysis, the proposed CSA exploits the spatial relationships between channels of feature maps to produce an effective channel descriptor.
We validate the effectiveness of the proposed CSA networks through extensive experiments and analysis on ImageNet, and MS COCO benchmark datasets.
arXiv Detail & Related papers (2024-05-09T13:21:03Z) - TOPIQ: A Top-down Approach from Semantics to Distortions for Image
Quality Assessment [53.72721476803585]
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks.
We propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions.
A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features.
arXiv Detail & Related papers (2023-08-06T09:08:37Z) - Systematic Architectural Design of Scale Transformed Attention Condenser
DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim)
We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy.
Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z) - ASU-CNN: An Efficient Deep Architecture for Image Classification and
Feature Visualizations [0.0]
Activation functions play a decisive role in determining the capacity of Deep Neural Networks.
In this paper, a Convolutional Neural Network model named as ASU-CNN is proposed.
The network achieved promising results on both training and testing data for the classification of CIFAR-10.
arXiv Detail & Related papers (2023-05-28T16:52:25Z) - Research on Dual Channel News Headline Classification Based on ERNIE
Pre-training Model [13.222137788045416]
The proposed model improves the accuracy, precision and F1-score of news headline classification compared with the traditional neural network model.
It can perform well in the multi-classification application of news headline text under large data volume.
arXiv Detail & Related papers (2022-02-14T10:44:12Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Learning Deep Interleaved Networks with Asymmetric Co-Attention for
Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction.
In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies.
Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z) - Hybrid Multiple Attention Network for Semantic Segmentation in Aerial
Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations.
We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.