A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods
- URL: http://arxiv.org/abs/2507.01143v1
- Date: Tue, 01 Jul 2025 19:00:50 GMT
- Title: A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods
- Authors: Reza Jalayer, Masoud Jalayer, Amirali Baniasadi,
- Abstract summary: Sound source localization (SSL) adds a spatial dimension to auditory perception, allowing a system to pinpoint the origin of speech, machinery noise, warning tones, or other acoustic events.<n>This review addresses these gaps by offering a robotics-focused synthesis, emphasizing recent progress in deep learning methodologies.
- Score: 0.20482269513546458
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sound source localization (SSL) adds a spatial dimension to auditory perception, allowing a system to pinpoint the origin of speech, machinery noise, warning tones, or other acoustic events, capabilities that facilitate robot navigation, human-machine dialogue, and condition monitoring. While existing surveys provide valuable historical context, they typically address general audio applications and do not fully account for robotic constraints or the latest advancements in deep learning. This review addresses these gaps by offering a robotics-focused synthesis, emphasizing recent progress in deep learning methodologies. We start by reviewing classical methods such as Time Difference of Arrival (TDOA), beamforming, Steered-Response Power (SRP), and subspace analysis. Subsequently, we delve into modern machine learning (ML) and deep learning (DL) approaches, discussing traditional ML and neural networks (NNs), convolutional neural networks (CNNs), convolutional recurrent neural networks (CRNNs), and emerging attention-based architectures. The data and training strategy that are the two cornerstones of DL-based SSL are explored. Studies are further categorized by robot types and application domains to facilitate researchers in identifying relevant work for their specific contexts. Finally, we highlight the current challenges in SSL works in general, regarding environmental robustness, sound source multiplicity, and specific implementation constraints in robotics, as well as data and learning strategies in DL-based SSL. Also, we sketch promising directions to offer an actionable roadmap toward robust, adaptable, efficient, and explainable DL-based SSL for next-generation robots.
Related papers
- Bridging Brain with Foundation Models through Self-Supervised Learning [5.0273296425814635]
Foundation models (FMs) have redefined the capabilities of artificial intelligence.<n>These advances present a transformative opportunity for brain signal analysis.<n>This survey systematically reviews the emerging field of bridging brain signals with foundation models.
arXiv Detail & Related papers (2025-06-19T04:03:58Z) - LPAC: Learnable Perception-Action-Communication Loops with Applications
to Coverage Control [80.86089324742024]
We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem.
CNN processes localized perception; a graph neural network (GNN) facilitates robot communications.
Evaluations show that the LPAC models outperform standard decentralized and centralized coverage control algorithms.
arXiv Detail & Related papers (2024-01-10T00:08:00Z) - ML-ASPA: A Contemplation of Machine Learning-based Acoustic Signal
Processing Analysis for Sounds, & Strains Emerging Technology [0.0]
This inquiry explores recent advancements and transformative potential within the domain of acoustics, specifically focusing on machine learning (ML) and deep learning.
ML adopts a data-driven approach, unveiling intricate relationships between features and desired labels or actions, as well as among features themselves.
The application of ML to expansive sets of training data facilitates the discovery of models elucidating complex acoustic phenomena such as human speech and reverberation.
arXiv Detail & Related papers (2023-12-18T03:04:42Z) - Combatting Human Trafficking in the Cyberspace: A Natural Language
Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques.
We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models.
A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z) - A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends [82.64268080902742]
Self-supervised learning (SSL) aims to learn discriminative features from unlabeled data without relying on human-annotated labels.
SSL has garnered significant attention recently, leading to the development of numerous related algorithms.
This paper presents a review of diverse SSL methods, encompassing algorithmic aspects, application domains, three key trends, and open research questions.
arXiv Detail & Related papers (2023-01-13T14:41:05Z) - SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous
Driving [9.702784248870522]
Self-supervised learning (SSL) is an emerging technique to train convolutional neural networks (CNNs) and graph neural networks (GNNs)
In this study, we report the first systematic exploration of incorporating self-supervision into motion forecasting.
arXiv Detail & Related papers (2022-06-28T16:23:25Z) - Audio Self-supervised Learning: A Survey [60.41768569891083]
Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations.
Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing.
arXiv Detail & Related papers (2022-03-02T15:58:29Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - Deep learning approaches for neural decoding: from CNNs to LSTMs and
spikes to fMRI [2.0178765779788495]
Decoding behavior, perception, or cognitive state directly from neural signals has applications in brain-computer interface research.
In the last decade, deep learning has become the state-of-the-art method in many machine learning tasks.
Deep learning has been shown to be a useful tool for improving the accuracy and flexibility of neural decoding across a wide range of tasks.
arXiv Detail & Related papers (2020-05-19T18:10:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.