DCASE 2022: Comparative Analysis Of CNNs For Acoustic Scene
Classification Under Low-Complexity Considerations
- URL: http://arxiv.org/abs/2206.08007v1
- Date: Thu, 16 Jun 2022 09:03:56 GMT
- Title: DCASE 2022: Comparative Analysis Of CNNs For Acoustic Scene
Classification Under Low-Complexity Considerations
- Authors: Josep Zaragoza-Paredes, Javier Naranjo-Alcazar, Valery Naranjo and
Pedro Zuccarello
- Abstract summary: This report makes a comparative study of two different network architectures: conventional CNN and Conv-mixer.
Although both networks exceed the baseline required by the competition, the conventional CNN shows a higher performance.
Solutions based on Conv-mixer architectures show worse performance although they are much lighter solutions.
- Score: 1.6704594205447996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acoustic scene classification is an automatic listening problem that aims to
assign an audio recording to a pre-defined scene based on its audio data. Over
the years (and in past editions of the DCASE) this problem has often been
solved with techniques known as ensembles (use of several machine learning
models to combine their predictions in the inference phase). While these
solutions can show performance in terms of accuracy, they can be very expensive
in terms of computational capacity, making it impossible to deploy them in IoT
devices. Due to the drift in this field of study, this task has two limitations
in terms of model complexity. It should be noted that there is also the added
complexity of mismatching devices (the audios provided are recorded by
different sources of information). This technical report makes a comparative
study of two different network architectures: conventional CNN and Conv-mixer.
Although both networks exceed the baseline required by the competition, the
conventional CNN shows a higher performance, exceeding the baseline by 8
percentage points. Solutions based on Conv-mixer architectures show worse
performance although they are much lighter solutions.
Related papers
- Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio
Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting.
When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances.
Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Low-complexity CNNs for Acoustic Scene Classification [23.661189257759535]
This paper presents a low-complexity framework for acoustic scene classification (ASC)
Most of the frameworks designed for ASC use convolutional neural networks (CNNs) due to their learning ability and improved performance.
CNNs are resource hungry due to their large size and high computational complexity.
arXiv Detail & Related papers (2022-07-23T14:37:39Z) - Wider or Deeper Neural Network Architecture for Acoustic Scene
Classification with Mismatched Recording Devices [59.86658316440461]
We present a robust and low complexity system for Acoustic Scene Classification (ASC)
We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue.
To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction.
arXiv Detail & Related papers (2022-03-23T10:27:41Z) - Task 1A DCASE 2021: Acoustic Scene Classification with mismatch-devices
using squeeze-excitation technique and low-complexity constraint [4.4973334555746]
Acoustic scene classification (ASC) is one of the most popular problems in the field of machine listening.
The subtask presented in this report corresponds to a ASC problem that is constrained by the complexity of the model.
Specifically, a system based on two steps is proposed: a two-dimensional representation of the audio using the Gamamtone filter bank and a convolutional neural network.
arXiv Detail & Related papers (2021-07-30T14:24:45Z) - TASK3 DCASE2021 Challenge: Sound event localization and detection using
squeeze-excitation residual CNNs [4.4973334555746]
This study is based on the one carried out by the same team last year.
It has been decided to study how this technique improves each of the datasets.
This modification shows an improvement in the performance of the system compared to the baseline using MIC dataset.
arXiv Detail & Related papers (2021-07-30T11:34:15Z) - A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness.
Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set.
Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z) - Cross-domain Adaptation with Discrepancy Minimization for
Text-independent Forensic Speaker Verification [61.54074498090374]
This study introduces a CRSS-Forensics audio dataset collected in multiple acoustic environments.
We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics.
arXiv Detail & Related papers (2020-09-05T02:54:33Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - Acoustic Scene Classification with Squeeze-Excitation Residual Networks [4.591851728010269]
We propose two novel squeeze-excitation blocks to improve the accuracy of a CNN-based ASC framework based on residual learning.
The behavior of the block that implements such operators and, therefore, the entire neural network, can be modified depending on the input to the block.
arXiv Detail & Related papers (2020-03-20T14:07:11Z) - NAViDAd: A No-Reference Audio-Visual Quality Metric Based on a Deep
Autoencoder [0.0]
We propose a No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd)
The model is formed by a 2-layer framework that includes a deep autoencoder layer and a classification layer.
The model performed well when tested against the UnB-AV and the LiveNetflix-II databases.
arXiv Detail & Related papers (2020-01-30T15:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.