Related papers: Position tracking of a varying number of sound sources with sliding permutation invariant training

Position tracking of a varying number of sound sources with sliding permutation invariant training

URL: http://arxiv.org/abs/2210.14536v2
Date: Mon, 5 Jun 2023 11:14:23 GMT
Title: Position tracking of a varying number of sound sources with sliding permutation invariant training
Authors: David Diaz-Guerra, Archontis Politis and Tuomas Virtanen
Abstract summary: We present a new training strategy for deep learning sound source localization models. It is based on the mean squared error of the optimal association between estimated and reference positions. It minimizes identity switches without compromising frame-wise localization accuracy.
Score: 19.873949136858354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent data- and learning-based sound source localization (SSL) methods have shown strong performance in challenging acoustic scenarios. However, little work has been done on adapting such methods to track consistently multiple sources appearing and disappearing, as would occur in reality. In this paper, we present a new training strategy for deep learning SSL models with a straightforward implementation based on the mean squared error of the optimal association between estimated and reference positions in the preceding time frames. It optimizes the desired properties of a tracking system: handling a time-varying number of sources and ordering localization estimates according to their trajectories, minimizing identity switches (IDSs). Evaluation on simulated data of multiple reverberant moving sources and on two model architectures proves its effectiveness on reducing identity switches without compromising frame-wise localization accuracy.

Related papers

Improved Localized Machine Unlearning Through the Lens of Memorization [23.30800397324838]
We study localized unlearning, where the unlearning algorithm operates on a small subset of parameters. We propose an improved localization strategy that yields strong results when paired with existing unlearning algorithms. We also propose a new unlearning algorithm, Deletion by Example localization (DEL), that resets the parameters deemed-to-be most critical.
arXiv Detail & Related papers (2024-12-03T12:57:08Z)
SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio [17.811771707446926]
We show that learning based methods can, even based on synthetic data, significantly outperform GCC-PHAT on novel real world data. We provide our trained model, SONNET, which is runnable in real-time and works on novel data out of the box for many real data applications.
arXiv Detail & Related papers (2024-11-20T10:23:21Z)
Step-wise Distribution Alignment Guided Style Prompt Tuning for Source-free Cross-domain Few-shot Learning [53.60934432718044]
Cross-domain few-shot learning methods face challenges with large-scale pre-trained models due to inaccessible source data and training strategies. This paper introduces Step-wise Distribution Alignment Guided Style Prompt Tuning (StepSPT) StepSPT implicitly narrows domain gaps through prediction distribution optimization.
arXiv Detail & Related papers (2024-11-15T09:34:07Z)
Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets [0.0]
We propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS. Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations.
arXiv Detail & Related papers (2024-06-11T07:32:25Z)
Informative regularization for a multi-layer perceptron RR Lyrae classifier under data shift [3.303002683812084]
We propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem. Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
arXiv Detail & Related papers (2023-03-12T02:49:19Z)
Exploiting Temporal Structures of Cyclostationary Signals for Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS) We focus on cyclostationary signals, which are particularly suitable in a variety of application domains. We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z)
Iterative Sound Source Localization for Unknown Number of Sources [57.006589498243336]
We propose an iterative sound source localization approach called ISSL, which can iteratively extract each source's DOA without threshold until the termination criterion is met. Our ISSL achieves significant performance improvements in both DOA estimation and source number detection compared with the existing threshold-based algorithms.
arXiv Detail & Related papers (2022-06-24T13:19:44Z)
A Review of Sound Source Localization with Deep Learning Methods [71.18444724397486]
This article is a review on deep learning methods for single and multiple sound source localization. We provide an exhaustive topography of the neural-based localization literature in this context. Tables summarizing the literature review are provided at the end of the review for a quick search of methods with a given set of target characteristics.
arXiv Detail & Related papers (2021-09-08T07:25:39Z)
PILOT: Introducing Transformers for Probabilistic Sound Event Localization [107.78964411642401]
This paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms. The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy.
arXiv Detail & Related papers (2021-06-07T18:29:19Z)
Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios. For dataset bias due to different samplers, we propose shifted batch normalization. Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.