A Review of Sound Source Localization with Deep Learning Methods
- URL: http://arxiv.org/abs/2109.03465v1
- Date: Wed, 8 Sep 2021 07:25:39 GMT
- Title: A Review of Sound Source Localization with Deep Learning Methods
- Authors: Pierre-Amaury Grumiaux, Sr{\dj}an Kiti\'c, Laurent Girin, Alexandre
Gu\'erin
- Abstract summary: This article is a review on deep learning methods for single and multiple sound source localization.
We provide an exhaustive topography of the neural-based localization literature in this context.
Tables summarizing the literature review are provided at the end of the review for a quick search of methods with a given set of target characteristics.
- Score: 71.18444724397486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article is a review on deep learning methods for single and multiple
sound source localization. We are particularly interested in sound source
localization in indoor/domestic environment, where reverberation and diffuse
noise are present. We provide an exhaustive topography of the neural-based
localization literature in this context, organized according to several
aspects: the neural network architecture, the type of input features, the
output strategy (classification or regression), the types of data used for
model training and evaluation, and the model training strategy. This way, an
interested reader can easily comprehend the vast panorama of the deep
learning-based sound source localization methods. Tables summarizing the
literature review are provided at the end of the review for a quick search of
methods with a given set of target characteristics.
Related papers
- Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment [50.92136296059296]
Cross-modal interaction is vital for understanding semantically matched or mismatched audio-visual events.
New benchmarks and evaluation metrics reveal previously overlooked issues in sound source localization studies.
This work provides the most comprehensive analysis of sound source localization to date.
arXiv Detail & Related papers (2024-07-18T16:51:15Z) - T-VSL: Text-Guided Visual Sound Source Localization in Mixtures [33.28678401737415]
We develop a framework to disentangle audio-visual source correspondence from multi-source mixtures.
Our framework exhibits promising zero-shot transferability to unseen classes during test time.
Experiments conducted on the MUSIC, VGGSound, and VGGSound-Instruments datasets demonstrate significant performance improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-04-02T09:07:05Z) - Sound event localization and classification using WASN in Outdoor Environment [2.234738672139924]
Methods for sound event localization and classification typically rely on a single microphone array.
We propose a deep learning-based method that employs multiple features and attention mechanisms to estimate the location and class of sound source.
arXiv Detail & Related papers (2024-03-29T11:44:14Z) - Matching Text and Audio Embeddings: Exploring Transfer-learning
Strategies for Language-based Audio Retrieval [11.161404854726348]
We present an analysis of large-scale pretrained deep learning models used for cross-modal (text-to-audio) retrieval.
We use embeddings extracted by these models in a metric learning framework to connect matching pairs of audio and text.
arXiv Detail & Related papers (2022-10-06T11:45:14Z) - Acoustic-Net: A Novel Neural Network for Sound Localization and
Quantification [28.670240455952317]
A novel neural network, termed the Acoustic-Net, is proposed to locate and quantify the sound source simply using the original signals.
The experiments demonstrate that the proposed method significantly improves the accuracy of sound source prediction and the computing speed.
arXiv Detail & Related papers (2022-03-31T12:20:09Z) - Unsupervised Audio Source Separation Using Differentiable Parametric
Source Models [8.80867379881193]
We propose an unsupervised model-based deep learning approach to musical source separation.
A neural network is trained to reconstruct the observed mixture as a sum of the sources.
The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods.
arXiv Detail & Related papers (2022-01-24T11:05:30Z) - PILOT: Introducing Transformers for Probabilistic Sound Event
Localization [107.78964411642401]
This paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms.
The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy.
arXiv Detail & Related papers (2021-06-07T18:29:19Z) - Deep Learning Based Text Classification: A Comprehensive Review [75.8403533775179]
We provide a review of more than 150 deep learning based models for text classification developed in recent years.
We also provide a summary of more than 40 popular datasets widely used for text classification.
arXiv Detail & Related papers (2020-04-06T02:00:30Z) - Unsupervised Cross-Modal Audio Representation Learning from Unstructured
Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning.
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.
We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z) - Non-Local Part-Aware Point Cloud Denoising [55.50360085086123]
This paper presents a novel non-local part-aware deep neural network to denoise point clouds.
We design the non-local learning unit (NLU) customized with a graph attention module to adaptively capture non-local semantically-related features.
To enhance the denoising performance, we cascade a series of NLUs to progressively distill the noise features from the noisy inputs.
arXiv Detail & Related papers (2020-03-14T13:51:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.