TC-SKNet with GridMask for Low-complexity Classification of Acoustic
scene
- URL: http://arxiv.org/abs/2210.02287v1
- Date: Wed, 5 Oct 2022 14:24:17 GMT
- Title: TC-SKNet with GridMask for Low-complexity Classification of Acoustic
scene
- Authors: Luyuan Xie, Yan Zhong, Lin Yang, Zhaoyu Yan, Zhonghai Wu, Junjie Wang
- Abstract summary: We combine Selective Kernel Network with Temporal-Convolution (TC-SKNet) to adjust the receptive field of convolution kernels.
GridMask is a data augmentation strategy by masking part of the raw data or feature area.
As a result, a peak accuracy of 59.87% TC-SKNet is equivalent to that of SOTA, but the parameters only use 20.9 K.
- Score: 15.010375209235924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolution neural networks (CNNs) have good performance in low-complexity
classification tasks such as acoustic scene classifications (ASCs). However,
there are few studies on the relationship between the length of target speech
and the size of the convolution kernels. In this paper, we combine Selective
Kernel Network with Temporal-Convolution (TC-SKNet) to adjust the receptive
field of convolution kernels to solve the problem of variable length of target
voice while keeping low-complexity. GridMask is a data augmentation strategy by
masking part of the raw data or feature area. It can enhance the generalization
of the model as the role of dropout. In our experiments, the performance gain
brought by GridMask is stronger than spectrum augmentation in ASCs. Finally, we
adopt AutoML to search best structure of TC-SKNet and hyperparameters of
GridMask for improving the classification performance. As a result, a peak
accuracy of 59.87% TC-SKNet is equivalent to that of SOTA, but the parameters
only use 20.9 K.
Related papers
- Scale-Invariant Object Detection by Adaptive Convolution with Unified Global-Local Context [3.061662434597098]
We propose an object detection model using a Switchable (adaptive) Atrous Convolutional Network (SAC-Net) based on the efficientDet model.
The proposed SAC-Net encapsulates the benefits of both low-level and high-level features to achieve improved performance on multi-scale object detection tasks.
Our experiments on benchmark datasets demonstrate that the proposed SAC-Net outperforms the state-of-the-art models by a significant margin in terms of accuracy.
arXiv Detail & Related papers (2024-09-17T10:08:37Z) - MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning [7.262751938473306]
Pruning is a well-established technique that reduces the size of neural networks while mathematically guaranteeing accuracy preservation.
We develop a new pruning algorithm, MPruner, that leverages mutual information through vector similarity.
MPruner achieved up to a 50% reduction in parameters and memory usage for CNN and transformer-based models, with minimal to no loss in accuracy.
arXiv Detail & Related papers (2024-08-24T05:54:47Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - Feature Selection using Sparse Adaptive Bottleneck Centroid-Encoder [1.2487990897680423]
We introduce a novel nonlinear model, Sparse Adaptive Bottleneckid-Encoder (SABCE), for determining the features that discriminate between two or more classes.
The algorithm is applied to various real-world data sets, including high-dimensional biological, image, speech, and accelerometer sensor data.
arXiv Detail & Related papers (2023-06-07T21:37:21Z) - Sharpend Cosine Similarity based Neural Network for Hyperspectral Image
Classification [0.456877715768796]
Hyperspectral Image Classification (HSIC) is a difficult task due to high inter and intra-class similarity and variability, nested regions, and overlapping.
2D Convolutional Neural Networks (CNN) emerged as a viable network whereas, 3D CNNs are a better alternative due to accurate classification.
This paper introduces Sharpened Cosine Similarity (SCS) concept as an alternative to convolutions in a Neural Network for HSIC.
arXiv Detail & Related papers (2023-05-26T07:04:00Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for
Real-Time Semantic Segmentation [27.232578592161673]
We devise a novel lightweight network using a multi-scale context fusion scheme (MSCFNet)
The proposed MSCFNet contains only 1.15M parameters, achieves 71.9% Mean IoU and can run at over 50 FPS on a single Titan XP GPU configuration.
arXiv Detail & Related papers (2021-03-24T08:28:26Z) - Intermediate Loss Regularization for CTC-based Speech Recognition [58.33721897180646]
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification ( CTC) objective.
We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively.
arXiv Detail & Related papers (2021-02-05T15:01:03Z) - Improved Mask-CTC for Non-Autoregressive End-to-End ASR [49.192579824582694]
Recently proposed end-to-end ASR system based on mask-predict with connectionist temporal classification (CTC)
We propose to enhance the network architecture by employing a recently proposed architecture called Conformer.
Next, we propose new training and decoding methods by introducing auxiliary objective to predict the length of a partial target sequence.
arXiv Detail & Related papers (2020-10-26T01:22:35Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.