Audio Simulation for Sound Source Localization in Virtual Evironment
- URL: http://arxiv.org/abs/2404.01611v1
- Date: Tue, 2 Apr 2024 03:18:28 GMT
- Title: Audio Simulation for Sound Source Localization in Virtual Evironment
- Authors: Yi Di Yuan, Swee Liang Wong, Jonathan Pan,
- Abstract summary: Non-line-of-sight localization in signal-deprived environments is a challenging yet pertinent problem.
In this study, we aim to locate sound sources to specific locations within a virtual environment by leveraging physically grounded sound propagation simulations and machine learning methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-line-of-sight localization in signal-deprived environments is a challenging yet pertinent problem. Acoustic methods in such predominantly indoor scenarios encounter difficulty due to the reverberant nature. In this study, we aim to locate sound sources to specific locations within a virtual environment by leveraging physically grounded sound propagation simulations and machine learning methods. This process attempts to overcome the issue of data insufficiency to localize sound sources to their location of occurrence especially in post-event localization. We achieve 0.786+/- 0.0136 F1-score using an audio transformer spectrogram approach.
Related papers
- Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation [25.410770364140856]
Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain.
This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs)
We introduce the notion of dynamic perturbation, which can inject controlled perturbations into the noise embeddings during inference.
arXiv Detail & Related papers (2024-09-03T02:29:01Z) - AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - Sound event localization and classification using WASN in Outdoor Environment [2.234738672139924]
Methods for sound event localization and classification typically rely on a single microphone array.
We propose a deep learning-based method that employs multiple features and attention mechanisms to estimate the location and class of sound source.
arXiv Detail & Related papers (2024-03-29T11:44:14Z) - Attention-Driven Multichannel Speech Enhancement in Moving Sound Source
Scenarios [11.811571392419324]
Speech enhancement algorithms typically assume a stationary sound source, a common mismatch with reality that limits their performance in real-world scenarios.
This paper focuses on attention-driven spatial filtering techniques designed for dynamic settings.
arXiv Detail & Related papers (2023-12-17T16:12:35Z) - Sound Source Localization is All about Cross-Modal Alignment [53.957081836232206]
Cross-modal semantic understanding is essential for genuine sound source localization.
We propose a joint task with sound source localization to better learn the interaction between audio and visual modalities.
Our method outperforms the state-of-the-art approaches in both sound source localization and cross-modal retrieval.
arXiv Detail & Related papers (2023-09-19T16:04:50Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - Visual Sound Localization in the Wild by Cross-Modal Interference
Erasing [90.21476231683008]
In real-world scenarios, audios are usually contaminated by off-screen sound and background noise.
We propose the Interference Eraser (IEr) framework, which tackles the problem of audio-visual sound source localization in the wild.
arXiv Detail & Related papers (2022-02-13T21:06:19Z) - A Deep Reinforcement Learning Approach for Audio-based Navigation and
Audio Source Localization in Multi-speaker Environments [1.0527821704930371]
In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within.
We create two virtual environments using the Unity game engine, one presenting an audio-based navigation problem and one presenting an audio source localization problem.
We also create an autonomous agent based on PPO online reinforcement learning algorithm and attempt to train it to solve these environments.
arXiv Detail & Related papers (2021-10-25T10:18:34Z) - A Review of Sound Source Localization with Deep Learning Methods [71.18444724397486]
This article is a review on deep learning methods for single and multiple sound source localization.
We provide an exhaustive topography of the neural-based localization literature in this context.
Tables summarizing the literature review are provided at the end of the review for a quick search of methods with a given set of target characteristics.
arXiv Detail & Related papers (2021-09-08T07:25:39Z) - AcousticFusion: Fusing Sound Source Localization to Visual SLAM in
Dynamic Environments [19.413143126734383]
We propose a novel audio-visual fusion approach that fuses sound source direction into the RGB-D image.
The proposed method uses very small computational resources to obtain very stable self-localization results.
arXiv Detail & Related papers (2021-08-03T02:10:26Z) - PILOT: Introducing Transformers for Probabilistic Sound Event
Localization [107.78964411642401]
This paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms.
The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy.
arXiv Detail & Related papers (2021-06-07T18:29:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.