Multitask frame-level learning for few-shot sound event detection
- URL: http://arxiv.org/abs/2403.11091v1
- Date: Sun, 17 Mar 2024 05:00:40 GMT
- Title: Multitask frame-level learning for few-shot sound event detection
- Authors: Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang,
- Abstract summary: This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples.
We introduce an innovative multitask frame-level SED framework and TimeFilterAug, a linear timing mask for data augmentation.
The proposed method achieves a F-score of 63.8%, securing the 1st rank in the few-shot bioacoustic event detection category.
- Score: 46.32294691870714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples. However, prevailing methods methods in few-shot SED predominantly rely on segment-level predictions, which often providing detailed, fine-grained predictions, particularly for events of brief duration. Although frame-level prediction strategies have been proposed to overcome these limitations, these strategies commonly face difficulties with prediction truncation caused by background noise. To alleviate this issue, we introduces an innovative multitask frame-level SED framework. In addition, we introduce TimeFilterAug, a linear timing mask for data augmentation, to increase the model's robustness and adaptability to diverse acoustic environments. The proposed method achieves a F-score of 63.8%, securing the 1st rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2023.
Related papers
- Double Mixture: Towards Continual Event Detection from Speech [60.33088725100812]
Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events.
This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events.
We propose a novel method, 'Double Mixture,' which merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting.
arXiv Detail & Related papers (2024-04-20T06:32:00Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Adaptive Fake Audio Detection with Low-Rank Model Squeezing [50.7916414913962]
Traditional approaches, such as finetuning, are computationally intensive and pose a risk of impairing the acquired knowledge of known fake audio types.
We introduce the concept of training low-rank adaptation matrices tailored specifically to the newly emerging fake audio types.
Our approach offers several advantages, including reduced storage memory requirements and lower equal error rates.
arXiv Detail & Related papers (2023-06-08T06:06:42Z) - Temporal Label Smoothing for Early Prediction of Adverse Events [0.0]
We propose Temporal Label Smoothing (TLS), a novel learning strategy that modulates smoothing strength as a function of proximity to the event of interest.
Our approach significantly improves performance on clinically-relevant metrics such as event recall at low false-alarm rates.
arXiv Detail & Related papers (2022-08-29T17:58:48Z) - Few-shot bioacoustic event detection at the DCASE 2022 challenge [0.0]
Few-shot sound event detection is the task of detecting sound events despite having only a few labelled examples.
This paper presents an overview of the second edition of the few-shot bioacoustic sound event detection task included in the DCASE 2022 challenge.
The highest F-score was of 60% on the evaluation set, which leads to a huge improvement over last year's edition.
arXiv Detail & Related papers (2022-07-14T09:33:47Z) - Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video
Parsing [52.2231419645482]
This paper focuses on the weakly-supervised audio-visual video parsing task.
It aims to recognize all events belonging to each modality and localize their temporal boundaries.
arXiv Detail & Related papers (2022-04-25T11:41:17Z) - A benchmark of state-of-the-art sound event detection systems evaluated
on synthetic soundscapes [10.512055210540668]
We study the solutions proposed by participants to analyze their robustness to varying level target to non-target signal-to-noise ratio and to temporal localization of target sound events.
Results show that systems tend to spuriously predict short events when non-target events are present.
arXiv Detail & Related papers (2022-02-03T09:41:31Z) - Proposal-based Few-shot Sound Event Detection for Speech and
Environmental Sounds with Perceivers [0.7776497736451751]
We propose a region proposal-based approach to few-shot sound event detection utilizing the Perceiver architecture.
Motivated by a lack of suitable benchmark datasets, we generate two new few-shot sound event localization datasets.
arXiv Detail & Related papers (2021-07-28T19:46:55Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.