The impact of non-target events in synthetic soundscapes for sound event
detection
- URL: http://arxiv.org/abs/2109.14061v1
- Date: Tue, 28 Sep 2021 21:46:19 GMT
- Title: The impact of non-target events in synthetic soundscapes for sound event
detection
- Authors: Francesca Ronchini, Romain Serizel, Nicolas Turpault, Samuele Cornell
- Abstract summary: We focus on the impact of non-target events in the synthetic soundscapes.
We analyze to what extend adjusting the signal-to-noise ratio between target and non-target events at training improves the sound event detection performance.
- Score: 13.616885869532533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4
uses a heterogeneous dataset that includes both recorded and synthetic
soundscapes. Until recently only target sound events were considered when
synthesizing the soundscapes. However, recorded soundscapes often contain a
substantial amount of non-target events that may affect the performance. In
this paper, we focus on the impact of these non-target events in the synthetic
soundscapes. Firstly, we investigate to what extent using non-target events
alternatively during the training or validation phase (or none of them) helps
the system to correctly detect target events. Secondly, we analyze to what
extend adjusting the signal-to-noise ratio between target and non-target events
at training improves the sound event detection performance. The results show
that using both target and non-target events for only one of the phases
(validation or training) helps the system to properly detect sound events,
outperforming the baseline (which uses non-target events in both phases). The
paper also reports the results of a preliminary study on evaluating the system
on clips that contain only non-target events. This opens questions for future
work on non-target subset and acoustic similarity between target and non-target
events which might confuse the system.
Related papers
- Double Mixture: Towards Continual Event Detection from Speech [60.33088725100812]
Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events.
This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events.
We propose a novel method, 'Double Mixture,' which merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting.
arXiv Detail & Related papers (2024-04-20T06:32:00Z) - Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - Avoiding Post-Processing with Event-Based Detection in Biomedical
Signals [69.34035527763916]
We propose an event-based modeling framework that directly works with events as learning targets.
We show that event-based modeling (without post-processing) performs on par with or better than epoch-based modeling with extensive post-processing.
arXiv Detail & Related papers (2022-09-22T13:44:13Z) - Unifying Event Detection and Captioning as Sequence Generation via
Pre-Training [53.613265415703815]
We propose a unified pre-training and fine-tuning framework to enhance the inter-task association between event detection and captioning.
Our model outperforms the state-of-the-art methods, and can be further boosted when pre-trained on extra large-scale video-text data.
arXiv Detail & Related papers (2022-07-18T14:18:13Z) - Segment-level Metric Learning for Few-shot Bioacoustic Event Detection [56.59107110017436]
We propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization.
Our system achieves an F-measure of 62.73 on the DCASE 2022 challenge task 5 (DCASE2022-T5) validation set, outperforming the performance of the baseline prototypical network 34.02 by a large margin.
arXiv Detail & Related papers (2022-07-15T22:41:30Z) - Few-shot bioacoustic event detection at the DCASE 2022 challenge [0.0]
Few-shot sound event detection is the task of detecting sound events despite having only a few labelled examples.
This paper presents an overview of the second edition of the few-shot bioacoustic sound event detection task included in the DCASE 2022 challenge.
The highest F-score was of 60% on the evaluation set, which leads to a huge improvement over last year's edition.
arXiv Detail & Related papers (2022-07-14T09:33:47Z) - Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video
Parsing [52.2231419645482]
This paper focuses on the weakly-supervised audio-visual video parsing task.
It aims to recognize all events belonging to each modality and localize their temporal boundaries.
arXiv Detail & Related papers (2022-04-25T11:41:17Z) - A benchmark of state-of-the-art sound event detection systems evaluated
on synthetic soundscapes [10.512055210540668]
We study the solutions proposed by participants to analyze their robustness to varying level target to non-target signal-to-noise ratio and to temporal localization of target sound events.
Results show that systems tend to spuriously predict short events when non-target events are present.
arXiv Detail & Related papers (2022-02-03T09:41:31Z) - Proposal-based Few-shot Sound Event Detection for Speech and
Environmental Sounds with Perceivers [0.7776497736451751]
We propose a region proposal-based approach to few-shot sound event detection utilizing the Perceiver architecture.
Motivated by a lack of suitable benchmark datasets, we generate two new few-shot sound event localization datasets.
arXiv Detail & Related papers (2021-07-28T19:46:55Z) - Cross-Referencing Self-Training Network for Sound Event Detection in
Audio Mixtures [23.568610919253352]
This paper proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self-training and cross-training.
The results of these methods on both "validation" and "public evaluation" sets of DESED database show significant improvement compared to the state-of-the art systems in semi-supervised learning.
arXiv Detail & Related papers (2021-05-27T18:46:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.