QTI Submission to DCASE 2021: residual normalization for
device-imbalanced acoustic scene classification with efficient design
- URL: http://arxiv.org/abs/2206.13909v1
- Date: Tue, 28 Jun 2022 11:42:52 GMT
- Title: QTI Submission to DCASE 2021: residual normalization for
device-imbalanced acoustic scene classification with efficient design
- Authors: Byeonggeun Kim, Seunghan Yang, Jangho Kim, Simyung Chang
- Abstract summary: The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity.
This report introduces four methods to achieve the goal.
The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters.
- Score: 11.412720572948087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This technical report describes the details of our TASK1A submission of the
DCASE2021 challenge. The goal of the task is to design an audio scene
classification system for device-imbalanced datasets under the constraints of
model complexity. This report introduces four methods to achieve the goal.
First, we propose Residual Normalization, a novel feature normalization method
that uses instance normalization with a shortcut path to discard unnecessary
device-specific information without losing useful information for
classification. Second, we design an efficient architecture, BC-ResNet-Mod, a
modified version of the baseline architecture with a limited receptive field.
Third, we exploit spectrogram-to-spectrogram translation from one to multiple
devices to augment training data. Finally, we utilize three model compression
schemes: pruning, quantization, and knowledge distillation to reduce model
complexity. The proposed system achieves an average test accuracy of 76.3% in
TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k
parameters, and average test accuracy of 75.3% after compression to 61.0KB of
non-zero parameters.
Related papers
- Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction [11.15868814062321]
Three systems are introduced to tackle training splits of different sizes.
For small training splits, we explored reducing the complexity of the provided baseline model by reducing the number of base channels.
For the larger training splits, we use FocusNet to provide confusing class information to an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) models and baseline models trained on the original sampling rate of 44.1 kHz.
arXiv Detail & Related papers (2024-09-18T13:16:00Z) - A Meta-Learning Approach to Predicting Performance and Data Requirements [163.4412093478316]
We propose an approach to estimate the number of samples required for a model to reach a target performance.
We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset.
We introduce a novel piecewise power law (PPL) that handles the two data differently.
arXiv Detail & Related papers (2023-03-02T21:48:22Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Wider or Deeper Neural Network Architecture for Acoustic Scene
Classification with Mismatched Recording Devices [59.86658316440461]
We present a robust and low complexity system for Acoustic Scene Classification (ASC)
We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue.
To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction.
arXiv Detail & Related papers (2022-03-23T10:27:41Z) - Domain Generalization on Efficient Acoustic Scene Classification using
Residual Normalization [10.992151305603267]
It is a practical research topic how to deal with multi-device audio inputs by a single acoustic scene classification system with efficient design.
We propose Residual Normalization, a novel feature normalization method that uses frequency-wise normalization % instance normalization with a shortcut path to discard unnecessary device-specific information.
The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters.
arXiv Detail & Related papers (2021-11-12T01:57:36Z) - A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust
Neural Acoustic Scene Classification [78.04177357888284]
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC)
We report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery.
arXiv Detail & Related papers (2021-07-03T16:25:24Z) - Small footprint Text-Independent Speaker Verification for Embedded
Systems [7.123796359179192]
We present a two-stage model architecture orders of magnitude smaller than common solutions for speaker verification.
We demonstrate the possibility of running our solution on small devices typical of IoT systems such as the Raspberry Pi 3B with a latency smaller than 200ms on a 5s long utterance.
arXiv Detail & Related papers (2020-11-03T13:53:05Z) - A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness.
Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set.
Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - Generative Multi-Stream Architecture For American Sign Language
Recognition [15.717424753251674]
Training on datasets with low feature-richness for complex applications limit optimal convergence below human performance.
We propose a generative multistream architecture, eliminating the need for additional hardware with the intent to improve feature convergence without risking impracticability.
Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.
arXiv Detail & Related papers (2020-03-09T21:04:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.