What Makes Sound Event Localization and Detection Difficult? Insights
from Error Analysis
- URL: http://arxiv.org/abs/2107.10469v1
- Date: Thu, 22 Jul 2021 06:01:49 GMT
- Title: What Makes Sound Event Localization and Detection Difficult? Insights
from Error Analysis
- Authors: Thi Ngoc Tho Nguyen and Karn N. Watcharasupat and Zhen Jian Lee and
Ngoc Khanh Nguyen and Douglas L. Jones and Woon Seng Gan
- Abstract summary: Sound event localization and detection (SELD) is an emerging research topic that aims to unify the tasks of sound event detection and direction-of-arrival estimation.
SELD inherits the challenges of both tasks, such as noise, reverberation, interference, polyphony, and non-stationarity of sound sources.
Previous studies have shown that unknown interferences in reverberant environments often cause major degradation in the performance of SELD systems.
- Score: 15.088901748728391
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sound event localization and detection (SELD) is an emerging research topic
that aims to unify the tasks of sound event detection and direction-of-arrival
estimation. As a result, SELD inherits the challenges of both tasks, such as
noise, reverberation, interference, polyphony, and non-stationarity of sound
sources. Furthermore, SELD often faces an additional challenge of assigning
correct correspondences between the detected sound classes and directions of
arrival to multiple overlapping sound events. Previous studies have shown that
unknown interferences in reverberant environments often cause major degradation
in the performance of SELD systems. To further understand the challenges of the
SELD task, we performed a detailed error analysis on two of our SELD systems,
which both ranked second in the team category of DCASE SELD Challenge, one in
2020 and one in 2021. Experimental results indicate polyphony as the main
challenge in SELD, due to the difficulty in detecting all sound events of
interest. In addition, the SELD systems tend to make fewer errors for the
polyphonic scenario that is dominant in the training set.
Related papers
- Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning [55.2480439325792]
Large audio-language models (LALMs) have shown impressive capabilities in understanding and reasoning about audio and speech information.
These models still face challenges, including hallucinating non-existent sound events, misidentifying the order of sound events, and incorrectly attributing sound sources.
arXiv Detail & Related papers (2024-10-21T15:55:27Z) - Activity-Guided Industrial Anomalous Sound Detection against Interferences [8.864726245462908]
We propose SSAD, a framework of source separation (SS) followed by anomaly detection (AD)
SSAD consists of two components: (i) activity-informed SS, enabling effective source separation even given interference with similar timbre, and (ii) two-step masking, robustifying anomaly detection by emphasizing anomalies aligned with the machine activity.
Our experiments demonstrate that SSAD achieves comparable accuracy to a baseline with full access to clean signals, while SSAD is provided only a corrupted signal and activity information.
arXiv Detail & Related papers (2024-09-03T13:26:25Z) - Double Mixture: Towards Continual Event Detection from Speech [60.33088725100812]
Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events.
This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events.
We propose a novel method, 'Double Mixture,' which merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting.
arXiv Detail & Related papers (2024-04-20T06:32:00Z) - Sound Event Detection and Localization with Distance Estimation [4.139846693958608]
3D SELD is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA)
We study two ways of integrating distance estimation within the SELD core.
Our results show that it is possible to perform 3D SELD without any degradation of performance in sound event detection and DOA estimation.
arXiv Detail & Related papers (2024-03-18T14:34:16Z) - Robust Tiny Object Detection in Aerial Images amidst Label Noise [50.257696872021164]
This study addresses the issue of tiny object detection under noisy label supervision.
We propose a DeNoising Tiny Object Detector (DN-TOD), which incorporates a Class-aware Label Correction scheme.
Our method can be seamlessly integrated into both one-stage and two-stage object detection pipelines.
arXiv Detail & Related papers (2024-01-16T02:14:33Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Few-shot bioacoustic event detection at the DCASE 2022 challenge [0.0]
Few-shot sound event detection is the task of detecting sound events despite having only a few labelled examples.
This paper presents an overview of the second edition of the few-shot bioacoustic sound event detection task included in the DCASE 2022 challenge.
The highest F-score was of 60% on the evaluation set, which leads to a huge improvement over last year's edition.
arXiv Detail & Related papers (2022-07-14T09:33:47Z) - Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised
Anomalous Sound Detection for Machine Condition Monitoring under Domain
Shifted Conditions [37.68195595947483]
This task focuses on the inevitable problem for the practical use of ASD systems.
The main challenge of this task is to detect unknown anomalous sounds where the acoustic characteristics of the training and testing samples are different.
arXiv Detail & Related papers (2021-06-08T16:26:10Z) - Multi-Scale One-Class Recurrent Neural Networks for Discrete Event
Sequence Anomaly Detection [63.825781848587376]
We propose OC4Seq, a one-class recurrent neural network for detecting anomalies in discrete event sequences.
Specifically, OC4Seq embeds the discrete event sequences into latent spaces, where anomalies can be easily detected.
arXiv Detail & Related papers (2020-08-31T04:48:22Z) - Deep Dense and Convolutional Autoencoders for Unsupervised Anomaly
Detection in Machine Condition Sounds [55.18259748448095]
This report describes two methods that were developed for Task 2 of the DCASE 2020 challenge.
The challenge involves an unsupervised learning to detect anomalous sounds, thus only normal machine working condition samples are available during the training process.
The two methods involve deep autoencoders, based on dense and convolutional architectures that use melspectogram processed sound features.
arXiv Detail & Related papers (2020-06-18T10:49:49Z) - Description and Discussion on DCASE2020 Challenge Task2: Unsupervised
Anomalous Sound Detection for Machine Condition Monitoring [36.60410256763345]
We present the results of the DCASE 2020 Challenge Task 2: Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring.
The goal of anomalous sound detection (ASD) is to identify whether the sound emitted from a target machine is normal or anomalous.
arXiv Detail & Related papers (2020-06-10T13:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.