Multi-Task Learning for Interpretable Weakly Labelled Sound Event
Detection
- URL: http://arxiv.org/abs/2008.07085v2
- Date: Thu, 29 Oct 2020 18:22:09 GMT
- Title: Multi-Task Learning for Interpretable Weakly Labelled Sound Event
Detection
- Authors: Soham Deshmukh, Bhiksha Raj, Rita Singh
- Abstract summary: This paper proposes a Multi-Task Learning framework for learning from Weakly Labelled Audio data.
We show that the chosen auxiliary task de-noises internal T-F representation and improves SED performance under noisy recordings.
The proposed total framework outperforms existing benchmark models over all SNRs.
- Score: 34.99472489405047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly Labelled learning has garnered lot of attention in recent years due to
its potential to scale Sound Event Detection (SED) and is formulated as
Multiple Instance Learning (MIL) problem. This paper proposes a Multi-Task
Learning (MTL) framework for learning from Weakly Labelled Audio data which
encompasses the traditional MIL setup. To show the utility of proposed
framework, we use the input TimeFrequency representation (T-F) reconstruction
as the auxiliary task. We show that the chosen auxiliary task de-noises
internal T-F representation and improves SED performance under noisy
recordings. Our second contribution is introducing two step Attention Pooling
mechanism. By having 2-steps in attention mechanism, the network retains better
T-F level information without compromising SED performance. The visualisation
of first step and second step attention weights helps in localising the
audio-event in T-F domain. For evaluating the proposed framework, we remix the
DCASE 2019 task 1 acoustic scene data with DCASE 2018 Task 2 sounds event data
under 0, 10 and 20 db SNR resulting in a multi-class Weakly labelled SED
problem. The proposed total framework outperforms existing benchmark models
over all SNRs, specifically 22.3 %, 12.8 %, 5.9 % improvement over benchmark
model on 0, 10 and 20 dB SNR respectively. We carry out ablation study to
determine the contribution of each auxiliary task and 2-step Attention Pooling
to the SED performance improvement. The code is publicly released
Related papers
- DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Robust, General, and Low Complexity Acoustic Scene Classification
Systems and An Effective Visualization for Presenting a Sound Scene Context [53.80051967863102]
We present a comprehensive analysis of Acoustic Scene Classification (ASC)
We propose an inception-based and low footprint ASC model, referred to as the ASC baseline.
Next, we improve the ASC baseline by proposing a novel deep neural network architecture.
arXiv Detail & Related papers (2022-10-16T19:07:21Z) - Segment-level Metric Learning for Few-shot Bioacoustic Event Detection [56.59107110017436]
We propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization.
Our system achieves an F-measure of 62.73 on the DCASE 2022 challenge task 5 (DCASE2022-T5) validation set, outperforming the performance of the baseline prototypical network 34.02 by a large margin.
arXiv Detail & Related papers (2022-07-15T22:41:30Z) - Voice2Series: Reprogramming Acoustic Models for Time Series
Classification [65.94154001167608]
Voice2Series is a novel end-to-end approach that reprograms acoustic models for time series classification.
We show that V2S either outperforms or is tied with state-of-the-art methods on 20 tasks, and improves their average accuracy by 1.84%.
arXiv Detail & Related papers (2021-06-17T07:59:15Z) - Improving weakly supervised sound event detection with self-supervised
auxiliary tasks [33.427215114252235]
We propose a shared encoder architecture with sound event detection as a primary task and an additional secondary decoder for a self-supervised auxiliary task.
We empirically evaluate the proposed framework for weakly supervised sound event detection on a remix dataset of the DCASE 2019 task 1 acoustic scene data.
The proposed framework with two-step attention outperforms existing benchmark models by 22.3%, 12.8%, 5.9% on 0, 10 and 20 dB SNR respectively.
arXiv Detail & Related papers (2021-06-12T20:28:22Z) - Environmental sound analysis with mixup based multitask learning and
cross-task fusion [0.12891210250935145]
acoustic scene classification and acoustic event classification are two closely related tasks.
In this letter, a two-stage method is proposed for the above tasks.
The proposed method has confirmed the complementary characteristics of acoustic scene and acoustic event classifications.
arXiv Detail & Related papers (2021-03-30T05:11:53Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.