Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation
- URL: http://arxiv.org/abs/2007.08389v2
- Date: Thu, 27 Aug 2020 00:33:27 GMT
- Title: Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation
- Authors: Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian
Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun
Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee
- Abstract summary: We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
- Score: 63.98724740606457
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this technical report, we present a joint effort of four groups, namely
GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification
(ASC) in the DCASE 2020 Challenge. Task 1 comprises two different sub-tasks:
(i) Task 1a focuses on ASC of audio signals recorded with multiple (real and
simulated) devices into ten different fine-grained classes, and (ii) Task 1b
concerns with classification of data into three higher-level classes using
low-complexity solutions. For Task 1a, we propose a novel two-stage ASC system
leveraging upon ad-hoc score combination of two convolutional neural networks
(CNNs), classifying the acoustic input according to three classes, and then ten
classes, respectively. Four different CNN-based architectures are explored to
implement the two-stage classifiers, and several data augmentation techniques
are also investigated. For Task 1b, we leverage upon a quantization method to
reduce the complexity of two of our top-accuracy three-classes CNN-based
architectures. On Task 1a development data set, an ASC accuracy of 76.9\% is
attained using our best single classifier and data augmentation. An accuracy of
81.9\% is then attained by a final model fusion of our two-stage ASC
classifiers. On Task 1b development data set, we achieve an accuracy of 96.7\%
with a model size smaller than 500KB. Code is available:
https://github.com/MihawkHu/DCASE2020_task1.
Related papers
- Robust, General, and Low Complexity Acoustic Scene Classification
Systems and An Effective Visualization for Presenting a Sound Scene Context [53.80051967863102]
We present a comprehensive analysis of Acoustic Scene Classification (ASC)
We propose an inception-based and low footprint ASC model, referred to as the ASC baseline.
Next, we improve the ASC baseline by proposing a novel deep neural network architecture.
arXiv Detail & Related papers (2022-10-16T19:07:21Z) - Wider or Deeper Neural Network Architecture for Acoustic Scene
Classification with Mismatched Recording Devices [59.86658316440461]
We present a robust and low complexity system for Acoustic Scene Classification (ASC)
We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue.
To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction.
arXiv Detail & Related papers (2022-03-23T10:27:41Z) - A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust
Neural Acoustic Scene Classification [78.04177357888284]
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC)
We report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery.
arXiv Detail & Related papers (2021-07-03T16:25:24Z) - Environmental sound analysis with mixup based multitask learning and
cross-task fusion [0.12891210250935145]
acoustic scene classification and acoustic event classification are two closely related tasks.
In this letter, a two-stage method is proposed for the above tasks.
The proposed method has confirmed the complementary characteristics of acoustic scene and acoustic event classifications.
arXiv Detail & Related papers (2021-03-30T05:11:53Z) - TechTexC: Classification of Technical Texts using Convolution and
Bidirectional Long Short Term Memory Network [0.0]
A classification system (called 'TechTexC') is developed to perform the classification task using three techniques.
Results show that CNN with BiLSTM model outperforms the other techniques concerning task-1 of sub-tasks (a, b, c and g) and task-2a.
In the case of test set, the combined CNN with BiLSTM approach achieved that higher accuracy for the subtasks 1a (70.76%), 1b (79.97%), 1c (65.45%), 1g (49.23%) and 2a (70.14%)
arXiv Detail & Related papers (2020-12-21T15:22:47Z) - A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness.
Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set.
Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z) - Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID
Twitter BERT and Bagging Ensemble Technique based on Plurality Voting [0.0]
We develop a system that automatically identifies whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not.
Our final approach achieved an F1-score of 0.9037 and we were ranked sixth overall with F1-score as the evaluation criteria.
arXiv Detail & Related papers (2020-10-01T10:54:54Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.