Learning Multiple Sound Source 2D Localization
- URL: http://arxiv.org/abs/2012.05515v1
- Date: Thu, 10 Dec 2020 08:51:16 GMT
- Title: Learning Multiple Sound Source 2D Localization
- Authors: Guillaume Le Moing, Phongtharin Vinayavekhin, Tadanobu Inoue, Jayakorn
Vongkulbhisal, Asim Munawar, Ryuki Tachibana, Don Joven Agravante
- Abstract summary: We propose novel deep learning based algorithms for multiple sound source localization.
We use an encoding-decoding architecture and propose two improvements on it to accomplish the task.
New metrics are developed relying on resolution-based multiple source association.
- Score: 7.564344795030588
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose novel deep learning based algorithms for multiple
sound source localization. Specifically, we aim to find the 2D Cartesian
coordinates of multiple sound sources in an enclosed environment by using
multiple microphone arrays. To this end, we use an encoding-decoding
architecture and propose two improvements on it to accomplish the task. In
addition, we also propose two novel localization representations which increase
the accuracy. Lastly, new metrics are developed relying on resolution-based
multiple source association which enables us to evaluate and compare different
localization approaches. We tested our method on both synthetic and real world
data. The results show that our method improves upon the previous baseline
approach for this problem.
Related papers
- Local and Global Decoding in Text Generation [36.38298679687864]
Text generation relies on decoding algorithms that sample strings from a language model distribution.
We investigate the effect of distortion by introducing globally-normalised versions of these decoding methods.
Our results suggest that distortion is an important feature of local decoding algorithms.
arXiv Detail & Related papers (2024-10-14T17:59:38Z) - Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge [14.801564966406486]
The goal of the multi-sound source localization task is to localize sound sources from the mixture individually.
We present a novel multi-sound source localization method that can perform localization without prior knowledge of the number of sound sources.
arXiv Detail & Related papers (2024-03-26T06:27:50Z) - Iterative Sound Source Localization for Unknown Number of Sources [57.006589498243336]
We propose an iterative sound source localization approach called ISSL, which can iteratively extract each source's DOA without threshold until the termination criterion is met.
Our ISSL achieves significant performance improvements in both DOA estimation and source number detection compared with the existing threshold-based algorithms.
arXiv Detail & Related papers (2022-06-24T13:19:44Z) - Quality-Aware Decoding for Neural Machine Translation [64.24934199944875]
We propose quality-aware decoding for neural machine translation (NMT)
We leverage recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods.
We find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics and to human assessments.
arXiv Detail & Related papers (2022-05-02T15:26:28Z) - Acoustic-Net: A Novel Neural Network for Sound Localization and
Quantification [28.670240455952317]
A novel neural network, termed the Acoustic-Net, is proposed to locate and quantify the sound source simply using the original signals.
The experiments demonstrate that the proposed method significantly improves the accuracy of sound source prediction and the computing speed.
arXiv Detail & Related papers (2022-03-31T12:20:09Z) - Self-Supervised Predictive Learning: A Negative-Free Method for Sound
Source Localization in Visual Scenes [91.59435809457659]
Self-Supervised Predictive Learning (SSPL) is a negative-free method for sound localization via explicit positive mining.
SSPL achieves significant improvements of 8.6% cIoU and 3.4% AUC on SoundNet-Flickr compared to the previous best.
arXiv Detail & Related papers (2022-03-25T01:42:42Z) - Active Restoration of Lost Audio Signals Using Machine Learning and
Latent Information [0.7252027234425334]
This paper proposes the combination of steganography, halftoning (dithering), and state-of-the-art shallow and deep learning methods.
We show improvement in the inpainting performance in terms of signal-to-noise ratio (SNR), the objective difference grade (ODG) and Hansen's audio quality metric.
arXiv Detail & Related papers (2021-11-21T20:11:33Z) - A Review of Sound Source Localization with Deep Learning Methods [71.18444724397486]
This article is a review on deep learning methods for single and multiple sound source localization.
We provide an exhaustive topography of the neural-based localization literature in this context.
Tables summarizing the literature review are provided at the end of the review for a quick search of methods with a given set of target characteristics.
arXiv Detail & Related papers (2021-09-08T07:25:39Z) - PILOT: Introducing Transformers for Probabilistic Sound Event
Localization [107.78964411642401]
This paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms.
The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy.
arXiv Detail & Related papers (2021-06-07T18:29:19Z) - Dual Normalization Multitasking for Audio-Visual Sounding Object
Localization [0.0]
We propose a new concept, Sounding Object, to reduce the ambiguity of the visual location of sound.
To tackle this new AVSOL problem, we propose a novel multitask training strategy and architecture called Dual Normalization Multitasking.
arXiv Detail & Related papers (2021-06-01T02:02:52Z) - Unsupervised Cross-Modal Audio Representation Learning from Unstructured
Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning.
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.
We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.