Improving Polyphonic Sound Event Detection on Multichannel Recordings
with the S{\o}rensen-Dice Coefficient Loss and Transfer Learning
- URL: http://arxiv.org/abs/2107.10471v1
- Date: Thu, 22 Jul 2021 06:14:23 GMT
- Title: Improving Polyphonic Sound Event Detection on Multichannel Recordings
with the S{\o}rensen-Dice Coefficient Loss and Transfer Learning
- Authors: Karn N. Watcharasupat and Thi Ngoc Tho Nguyen and Ngoc Khanh Nguyen
and Zhen Jian Lee and Douglas L. Jones and Woon Seng Gan
- Abstract summary: polyphonic sound event detection systems trained with Dice loss consistently outperformed those trained with cross-entropy loss.
We achieved further performance gains via the use of transfer learning and an appropriate combination of different data augmentation techniques.
- Score: 15.088901748728391
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The S{\o}rensen--Dice Coefficient has recently seen rising popularity as a
loss function (also known as Dice loss) due to its robustness in tasks where
the number of negative samples significantly exceeds that of positive samples,
such as semantic segmentation, natural language processing, and sound event
detection. Conventional training of polyphonic sound event detection systems
with binary cross-entropy loss often results in suboptimal detection
performance as the training is often overwhelmed by updates from negative
samples. In this paper, we investigated the effect of the Dice loss, intra- and
inter-modal transfer learning, data augmentation, and recording formats, on the
performance of polyphonic sound event detection systems with multichannel
inputs. Our analysis showed that polyphonic sound event detection systems
trained with Dice loss consistently outperformed those trained with
cross-entropy loss across different training settings and recording formats in
terms of F1 score and error rate. We achieved further performance gains via the
use of transfer learning and an appropriate combination of different data
augmentation techniques.
Related papers
- What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - Pretraining Representations for Bioacoustic Few-shot Detection using
Supervised Contrastive Learning [10.395255631261458]
In bioacoustic applications, most tasks come with few labelled training data, because annotating long recordings is time consuming and costly.
We show that learning a rich feature extractor from scratch can be achieved by leveraging data augmentation using a supervised contrastive learning framework.
We obtain an F-score of 63.46% on the validation set and 42.7% on the test set, ranking second in the DCASE challenge.
arXiv Detail & Related papers (2023-09-02T09:38:55Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio
Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting.
When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances.
Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Audiovisual transfer learning for audio tagging and sound event
detection [21.574781022415372]
We study the merit of transfer learning for two sound recognition problems, i.e., audio tagging and sound event detection.
We adapt a baseline system utilizing only spectral acoustic inputs to make use of pretrained auditory and visual features.
We perform experiments with these modified models on an audiovisual multi-label data set.
arXiv Detail & Related papers (2021-06-09T21:55:05Z) - Cross-Referencing Self-Training Network for Sound Event Detection in
Audio Mixtures [23.568610919253352]
This paper proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self-training and cross-training.
The results of these methods on both "validation" and "public evaluation" sets of DESED database show significant improvement compared to the state-of-the art systems in semi-supervised learning.
arXiv Detail & Related papers (2021-05-27T18:46:59Z) - Continual Learning for Fake Audio Detection [62.54860236190694]
This paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally.
Experiments are conducted on the ASVspoof 2019 dataset.
arXiv Detail & Related papers (2021-04-15T07:57:05Z) - Unsupervised Contrastive Learning of Sound Event Representations [30.914808451327403]
Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data.
In this work, we explore unsupervised contrastive learning as a way to learn sound event representations.
Our results suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels.
arXiv Detail & Related papers (2020-11-15T19:50:14Z) - End-to-end training of a two-stage neural network for defect detection [4.38301148531795]
gradient-based, two-stage neural network has shown excellent results in surface defect detection.
We introduce end-to-end training of the two-stage network together with several extensions to the training process.
We show state-of-the-art results on three defect detection datasets.
arXiv Detail & Related papers (2020-07-15T13:42:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.