CoLoC: Conditioned Localizer and Classifier for Sound Event Localization
and Detection
- URL: http://arxiv.org/abs/2210.13932v1
- Date: Tue, 25 Oct 2022 11:37:43 GMT
- Title: CoLoC: Conditioned Localizer and Classifier for Sound Event Localization
and Detection
- Authors: S{\l}awomir Kapka, Jakub Tkaczuk
- Abstract summary: We describe Conditioned Localizer and the (CoLoC) which is a novel solution for Sound Event localization and Detection (SELD)
The solution constitutes of two stages: the localization is done first and is followed by classification conditioned by the output of the localizer.
We show that our solution improves on the baseline system in most metrics on the STARSS22 dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this article, we describe Conditioned Localizer and Classifier (CoLoC)
which is a novel solution for Sound Event Localization and Detection (SELD).
The solution constitutes of two stages: the localization is done first and is
followed by classification conditioned by the output of the localizer. In order
to resolve the problem of the unknown number of sources we incorporate the idea
borrowed from Sequential Set Generation (SSG). Models from both stages are
SELDnet-like CRNNs, but with single outputs. Conducted reasoning shows that
such two single-output models are fit for SELD task. We show that our solution
improves on the baseline system in most metrics on the STARSS22 Dataset.
Related papers
- SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation [21.82296230219289]
We propose a network architecture for SELD called SELD-Mamba, which utilizes Mamba, a selective state-space model.
We adopt the Event-Independent Network V2 (EINV2) as the foundational framework and replace its Conformer blocks with bidirectional Mamba blocks.
We implement a two-stage training method, with the first stage focusing on Sound Event Detection (SED) and Direction of Arrival (DoA) estimation losses, and the second stage reintroducing the Source Distance Estimation (SDE) loss.
arXiv Detail & Related papers (2024-08-09T13:26:08Z) - Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge [14.801564966406486]
The goal of the multi-sound source localization task is to localize sound sources from the mixture individually.
We present a novel multi-sound source localization method that can perform localization without prior knowledge of the number of sound sources.
arXiv Detail & Related papers (2024-03-26T06:27:50Z) - Rethinking the Localization in Weakly Supervised Object Localization [51.29084037301646]
Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.
Recent dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task.
We propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background.
arXiv Detail & Related papers (2023-08-11T14:38:51Z) - Spatial-Aware Token for Weakly Supervised Object Localization [137.0570026552845]
We propose a task-specific spatial-aware token to condition localization in a weakly supervised manner.
Experiments show that the proposed SAT achieves state-of-the-art performance on both CUB-200 and ImageNet, with 98.45% and 73.13% GT-known Loc.
arXiv Detail & Related papers (2023-03-18T15:38:17Z) - Iterative Sound Source Localization for Unknown Number of Sources [57.006589498243336]
We propose an iterative sound source localization approach called ISSL, which can iteratively extract each source's DOA without threshold until the termination criterion is met.
Our ISSL achieves significant performance improvements in both DOA estimation and source number detection compared with the existing threshold-based algorithms.
arXiv Detail & Related papers (2022-06-24T13:19:44Z) - Locate This, Not That: Class-Conditioned Sound Event DOA Estimation [50.74947937253836]
We propose an alternative class-conditioned SELD model for situations where we may not be interested in all classes all of the time.
This class-conditioned SELD model takes as input the spatial and spectral features from the sound file, and also a one-hot vector indicating the class we are currently interested in localizing.
arXiv Detail & Related papers (2022-03-08T16:49:15Z) - A Hierarchical Model for Spoken Language Recognition [29.948719321162883]
Spoken language recognition ( SLR) refers to the automatic process used to determine the language present in a speech sample.
We propose a novel hierarchical approach were two PLDA models are trained, one to generate scores for clusters of highly related languages and a second one to generate scores conditional to each cluster.
We show that this hierarchical approach consistently outperforms the non-hierarchical one for detection of highly related languages.
arXiv Detail & Related papers (2022-01-04T22:10:36Z) - Denoised Non-Local Neural Network for Semantic Segmentation [18.84185406522064]
We propose a Denoised Non-Local Network (Denoised NL) to eliminate the inter-class and intra-class noises respectively.
Our proposed NL can achieve the state-of-the-art performance of 83.5% and 46.69% mIoU on Cityscapes and ADE20K, respectively.
arXiv Detail & Related papers (2021-10-27T06:16:31Z) - Score-based Generative Modeling in Latent Space [93.8985523558869]
Score-based generative models (SGMs) have recently demonstrated impressive results in terms of both sample quality and distribution coverage.
Here, we propose the Latent Score-based Generative Model (LSGM), a novel approach that trains SGMs in a latent space.
Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space.
arXiv Detail & Related papers (2021-06-10T17:26:35Z) - Contradictory Structure Learning for Semi-supervised Domain Adaptation [67.89665267469053]
Current adversarial adaptation methods attempt to align the cross-domain features.
Two challenges remain unsolved: 1) the conditional distribution mismatch and 2) the bias of the decision boundary towards the source domain.
We propose a novel framework for semi-supervised domain adaptation by unifying the learning of opposite structures.
arXiv Detail & Related papers (2020-02-06T22:58:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.