Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction
- URL: http://arxiv.org/abs/2409.11964v1
- Date: Wed, 18 Sep 2024 13:16:00 GMT
- Title: Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction
- Authors: Jin Jie Sean Yeo, Ee-Leng Tan, Jisheng Bai, Santi Peksi, Woon-Seng Gan,
- Abstract summary: Three systems are introduced to tackle training splits of different sizes.
For small training splits, we explored reducing the complexity of the provided baseline model by reducing the number of base channels.
For the larger training splits, we use FocusNet to provide confusing class information to an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) models and baseline models trained on the original sampling rate of 44.1 kHz.
- Score: 11.15868814062321
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this technical report, we describe the SNTL-NTU team's submission for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2024 challenge. Three systems are introduced to tackle training splits of different sizes. For small training splits, we explored reducing the complexity of the provided baseline model by reducing the number of base channels. We introduce data augmentation in the form of mixup to increase the diversity of training samples. For the larger training splits, we use FocusNet to provide confusing class information to an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) models and baseline models trained on the original sampling rate of 44.1 kHz. We use Knowledge Distillation to distill the ensemble model to the baseline student model. Training the systems on the TAU Urban Acoustic Scene 2022 Mobile development dataset yielded the highest average testing accuracy of (62.21, 59.82, 56.81, 53.03, 47.97)% on split (100, 50, 25, 10, 5)% respectively over the three systems.
Related papers
- Low-complexity deep learning frameworks for acoustic scene
classification using teacher-student scheme and multiple spectrograms [59.86658316440461]
The proposed system comprises two main phases: (Phase I) Training a teacher network; and (Phase II) training a student network using distilled knowledge from the teacher.
Our experiments conducted on DCASE 2023 Task 1 Development dataset have fulfilled the requirement of low-complexity and achieved the best classification accuracy of 57.4%.
arXiv Detail & Related papers (2023-05-16T14:21:45Z) - QTI Submission to DCASE 2021: residual normalization for
device-imbalanced acoustic scene classification with efficient design [11.412720572948087]
The goal of the task is to design an audio scene classification system for device-imbalanced datasets under the constraints of model complexity.
This report introduces four methods to achieve the goal.
The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters.
arXiv Detail & Related papers (2022-06-28T11:42:52Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Deep Transfer Learning for WiFi Localization [4.260395796577057]
This paper studies a WiFi indoor localisation technique based on using a deep learning model and its transfer strategies.
We achieve a localisation accuracy of 46.55 cm in an ideal $(6.5m times 2.5m)$ office with no obstacles, 58.30 cm in an office with obstacles, and 102.8 cm in a sports hall.
arXiv Detail & Related papers (2021-03-08T22:21:40Z) - Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation [95.31590177308482]
We propose an automated multi-loss adaptation (named Ada-Segment) to flexibly adjust multiple training losses over the course of training.
With an end-to-end architecture, Ada-Segment generalizes to different datasets without the need of re-tuning hyper parameters.
Ada-Segment brings 2.7% panoptic quality (PQ) improvement on COCO val split from the vanilla baseline, achieving the state-of-the-art 48.5% PQ on COCO test-dev split and 32.9% PQ on ADE20K dataset.
arXiv Detail & Related papers (2020-12-07T11:43:10Z) - A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness.
Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set.
Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - Capturing scattered discriminative information using a deep architecture
in acoustic scene classification [49.86640645460706]
In this study, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear activations in a deep neural network.
Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power.
arXiv Detail & Related papers (2020-07-09T08:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.