Task 1A DCASE 2021: Acoustic Scene Classification with mismatch-devices
using squeeze-excitation technique and low-complexity constraint
- URL: http://arxiv.org/abs/2107.14658v1
- Date: Fri, 30 Jul 2021 14:24:45 GMT
- Title: Task 1A DCASE 2021: Acoustic Scene Classification with mismatch-devices
using squeeze-excitation technique and low-complexity constraint
- Authors: Javier Naranjo-Alcazar, Sergi Perez-Castanos, Maximo Cobos, Francesc
J. Ferri, Pedro Zuccarello
- Abstract summary: Acoustic scene classification (ASC) is one of the most popular problems in the field of machine listening.
The subtask presented in this report corresponds to a ASC problem that is constrained by the complexity of the model.
Specifically, a system based on two steps is proposed: a two-dimensional representation of the audio using the Gamamtone filter bank and a convolutional neural network.
- Score: 4.4973334555746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acoustic scene classification (ASC) is one of the most popular problems in
the field of machine listening. The objective of this problem is to classify an
audio clip into one of the predefined scenes using only the audio data. This
problem has considerably progressed over the years in the different editions of
DCASE. It usually has several subtasks that allow to tackle this problem with
different approaches. The subtask presented in this report corresponds to a ASC
problem that is constrained by the complexity of the model as well as having
audio recorded from different devices, known as mismatch devices (real and
simulated). The work presented in this report follows the research line carried
out by the team in previous years. Specifically, a system based on two steps is
proposed: a two-dimensional representation of the audio using the Gamamtone
filter bank and a convolutional neural network using squeeze-excitation
techniques. The presented system outperforms the baseline by about 17
percentage points.
Related papers
- Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models [56.776580717999806]
Real-world applications often involve processing multiple audio streams simultaneously.
We propose the first multi-audio evaluation benchmark that consists of 20 datasets from 11 multi-audio tasks.
We propose a novel multi-audio-LLM (MALLM) to capture audio context among multiple similar audios.
arXiv Detail & Related papers (2024-09-27T12:06:53Z) - TranssionADD: A multi-frame reinforcement based sequence tagging model
for audio deepfake detection [11.27584658526063]
The second Audio Deepfake Detection Challenge (ADD 2023) aims to detect and analyze deepfake speech utterances.
We propose our novel TranssionADD system as a solution to the challenging problem of model robustness and audio segment outliers.
Our best submission achieved 2nd place in Track 2, demonstrating the effectiveness and robustness of our proposed system.
arXiv Detail & Related papers (2023-06-27T05:18:25Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Efficient Audio Captioning Transformer with Patchout and Text Guidance [74.59739661383726]
We propose a full Transformer architecture that utilizes Patchout as proposed in [1], significantly reducing the computational complexity and avoiding overfitting.
The caption generation is partly conditioned on textual AudioSet tags extracted by a pre-trained classification model.
Our proposed method received the Judges Award at the Task6A of DCASE Challenge 2022.
arXiv Detail & Related papers (2023-04-06T07:58:27Z) - AudioGen: Textually Guided Audio Generation [116.57006301417306]
We tackle the problem of generating audio samples conditioned on descriptive text captions.
In this work, we propose AaudioGen, an auto-regressive model that generates audio samples conditioned on text inputs.
arXiv Detail & Related papers (2022-09-30T10:17:05Z) - DCASE 2022: Comparative Analysis Of CNNs For Acoustic Scene
Classification Under Low-Complexity Considerations [1.6704594205447996]
This report makes a comparative study of two different network architectures: conventional CNN and Conv-mixer.
Although both networks exceed the baseline required by the competition, the conventional CNN shows a higher performance.
Solutions based on Conv-mixer architectures show worse performance although they are much lighter solutions.
arXiv Detail & Related papers (2022-06-16T09:03:56Z) - A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active
Speaker Selection [9.914246432182873]
We show that an end-to-end model performs at least as well as a considerably larger two-step system under various noise conditions.
In experiments involving over 50 thousand hours of public YouTube videos as training data, we first evaluate the accuracy of the attention layer on an active speaker selection task.
arXiv Detail & Related papers (2022-05-11T15:55:31Z) - TASK3 DCASE2021 Challenge: Sound event localization and detection using
squeeze-excitation residual CNNs [4.4973334555746]
This study is based on the one carried out by the same team last year.
It has been decided to study how this technique improves each of the datasets.
This modification shows an improvement in the performance of the system compared to the baseline using MIC dataset.
arXiv Detail & Related papers (2021-07-30T11:34:15Z) - Spectrum Correction: Acoustic Scene Classification with Mismatched
Recording Devices [9.404066316241051]
Machine learning algorithms, when trained on audio recordings from a limited set of devices, may not generalize well to samples recorded using other devices with different frequency responses.
In this work, a relatively straightforward method is introduced to address this problem.
Two variants of the approach are presented. First requires aligned examples from multiple devices, the second approach alleviates this requirement.
arXiv Detail & Related papers (2021-05-25T11:53:17Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.