Wider or Deeper Neural Network Architecture for Acoustic Scene
Classification with Mismatched Recording Devices
- URL: http://arxiv.org/abs/2203.12314v1
- Date: Wed, 23 Mar 2022 10:27:41 GMT
- Title: Wider or Deeper Neural Network Architecture for Acoustic Scene
Classification with Mismatched Recording Devices
- Authors: Lam Pham, Khoa Dinh, Dat Ngo, Hieu Tang, Alexander Schindler
- Abstract summary: We present a robust and low complexity system for Acoustic Scene Classification (ASC)
We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue.
To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction.
- Score: 59.86658316440461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a robust and low complexity system for Acoustic
Scene Classification (ASC), the task of identifying the scene of an audio
recording. We first construct an ASC baseline system in which a novel
inception-residual-based network architecture is proposed to deal with the
mismatched recording device issue. To further improve the performance but still
satisfy the low complexity model, we apply two techniques: ensemble of multiple
spectrograms and channel reduction on the ASC baseline system. By conducting
extensive experiments on the benchmark DCASE 2020 Task 1A Development dataset,
we achieve the best model performing an accuracy of 69.9% and a low complexity
of 2.4M trainable parameters, which is competitive to the state-of-the-art ASC
systems and potential for real-life applications on edge devices.
Related papers
- Tailored Design of Audio-Visual Speech Recognition Models using Branchformers [0.0]
We propose a novel framework for the design of parameter-efficient Audio-Visual Speech Recognition systems.
To be more precise, the proposed framework consists of two steps: first, estimating audio- and video-only systems, and then designing a tailored audio-visual unified encoder.
Results reflect how our tailored AVSR system is able to reach state-of-the-art recognition rates.
arXiv Detail & Related papers (2024-07-09T07:15:56Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Robust, General, and Low Complexity Acoustic Scene Classification
Systems and An Effective Visualization for Presenting a Sound Scene Context [53.80051967863102]
We present a comprehensive analysis of Acoustic Scene Classification (ASC)
We propose an inception-based and low footprint ASC model, referred to as the ASC baseline.
Next, we improve the ASC baseline by proposing a novel deep neural network architecture.
arXiv Detail & Related papers (2022-10-16T19:07:21Z) - A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust
Neural Acoustic Scene Classification [78.04177357888284]
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC)
We report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery.
arXiv Detail & Related papers (2021-07-03T16:25:24Z) - A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness.
Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set.
Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - A Multi-view CNN-based Acoustic Classification System for Automatic
Animal Species Identification [42.119250432849505]
We propose a deep learning based acoustic classification framework for Wireless Acoustic Sensor Network (WASN)
The proposed framework is based on cloud architecture which relaxes the computational burden on the wireless sensor node.
To improve the recognition accuracy, we design a multi-view Convolution Neural Network (CNN) to extract the short-, middle-, and long-term dependencies in parallel.
arXiv Detail & Related papers (2020-02-23T03:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.