Domain Generalization on Efficient Acoustic Scene Classification using
Residual Normalization
- URL: http://arxiv.org/abs/2111.06531v1
- Date: Fri, 12 Nov 2021 01:57:36 GMT
- Title: Domain Generalization on Efficient Acoustic Scene Classification using
Residual Normalization
- Authors: Byeonggeun Kim, Seunghan Yang, Jangho Kim, Simyung Chang
- Abstract summary: It is a practical research topic how to deal with multi-device audio inputs by a single acoustic scene classification system with efficient design.
We propose Residual Normalization, a novel feature normalization method that uses frequency-wise normalization % instance normalization with a shortcut path to discard unnecessary device-specific information.
The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters.
- Score: 10.992151305603267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is a practical research topic how to deal with multi-device audio inputs
by a single acoustic scene classification system with efficient design. In this
work, we propose Residual Normalization, a novel feature normalization method
that uses frequency-wise normalization % instance normalization with a shortcut
path to discard unnecessary device-specific information without losing useful
information for classification. Moreover, we introduce an efficient
architecture, BC-ResNet-ASC, a modified version of the baseline architecture
with a limited receptive field. BC-ResNet-ASC outperforms the baseline
architecture even though it contains the small number of parameters. Through
three model compression schemes: pruning, quantization, and knowledge
distillation, we can reduce model complexity further while mitigating the
performance degradation. The proposed system achieves an average test accuracy
of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with
315k parameters, and average test accuracy of 75.3% after compression to 61.0KB
of non-zero parameters. The proposed method won the 1st place in DCASE 2021
challenge, TASK1A.
Related papers
- Asca: less audio data is more insightful [10.354385253247761]
We introduce the Audio Spectrogram Convolution Attention (ASCA) based on CoAtNet.
On the BirdCLEF2023 and AudioSet(Balanced), ASCA achieved accuracies of 81.2% and 35.1%, respectively.
The unique structure of our model enriches output, enabling generalization across various audio detection tasks.
arXiv Detail & Related papers (2023-09-23T13:24:06Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - Sub-8-bit quantization for on-device speech recognition: a
regularization-free approach [19.84792318335999]
General Quantizer (GQ) is a regularization-free, "soft-to-hard" compression mechanism with self-adjustable centroids.
GQ can compress both RNN-T and Conformer into sub-8-bit, and for some RNN-T layers, to 1-bit for fast and accurate inference.
arXiv Detail & Related papers (2022-10-17T15:42:26Z) - QTI Submission to DCASE 2021: residual normalization for
device-imbalanced acoustic scene classification with efficient design [11.412720572948087]
The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity.
This report introduces four methods to achieve the goal.
The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters.
arXiv Detail & Related papers (2022-06-28T11:42:52Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - Wider or Deeper Neural Network Architecture for Acoustic Scene
Classification with Mismatched Recording Devices [59.86658316440461]
We present a robust and low complexity system for Acoustic Scene Classification (ASC)
We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue.
To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction.
arXiv Detail & Related papers (2022-03-23T10:27:41Z) - Rethinking Reconstruction Autoencoder-Based Out-of-Distribution
Detection [0.0]
Reconstruction autoencoder-based methods deal with the problem by using input reconstruction error as a metric of novelty vs. normality.
We introduce semantic reconstruction, data certainty decomposition and normalized L2 distance to substantially improve original methods.
Our method works without any additional data, hard-to-implement structure, time-consuming pipeline, and even harming the classification accuracy of known classes.
arXiv Detail & Related papers (2022-03-04T09:04:55Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust
Neural Acoustic Scene Classification [78.04177357888284]
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC)
We report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery.
arXiv Detail & Related papers (2021-07-03T16:25:24Z) - Capturing scattered discriminative information using a deep architecture
in acoustic scene classification [49.86640645460706]
In this study, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear activations in a deep neural network.
Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power.
arXiv Detail & Related papers (2020-07-09T08:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.