Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources
in Unmapped 3D Environments
- URL: http://arxiv.org/abs/2201.04279v1
- Date: Wed, 12 Jan 2022 03:08:03 GMT
- Title: Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources
in Unmapped 3D Environments
- Authors: Abdelrahman Younes
- Abstract summary: We introduce the novel dynamic audio-visual navigation benchmark in which an embodied AI agent must catch a moving sound source in an unmapped environment in the presence of distractors and noisy sounds.
Our approach outperforms the current state-of-the-art with better generalization to unheard sounds and better robustness to noisy scenarios.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work on audio-visual navigation targets a single static sound in
noise-free audio environments and struggles to generalize to unheard sounds. We
introduce the novel dynamic audio-visual navigation benchmark in which an
embodied AI agent must catch a moving sound source in an unmapped environment
in the presence of distractors and noisy sounds. We propose an end-to-end
reinforcement learning approach that relies on a multi-modal architecture that
fuses the spatial audio-visual information from a binaural audio signal and
spatial occupancy maps to encode the features needed to learn a robust
navigation policy for our new complex task settings. We demonstrate that our
approach outperforms the current state-of-the-art with better generalization to
unheard sounds and better robustness to noisy scenarios on the two challenging
3D scanned real-world datasets Replica and Matterport3D, for the static and
dynamic audio-visual navigation benchmarks. Our novel benchmark will be made
available at http://dav-nav.cs.uni-freiburg.de.
Related papers
- AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning.
We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF.
We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z) - SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning [127.1119359047849]
We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio rendering for 3D environments.
It generates highly realistic acoustics for arbitrary sounds captured from arbitrary microphone locations.
SoundSpaces 2.0 is publicly available to facilitate wider research for perceptual systems that can both see and hear.
arXiv Detail & Related papers (2022-06-16T17:17:44Z) - Towards Generalisable Audio Representations for Audio-Visual Navigation [18.738943602529805]
In audio-visual navigation (AVN), an intelligent agent needs to navigate to a constantly sound-making object in complex 3D environments.
We propose a contrastive learning-based method to tackle this challenge by regularising the audio encoder.
arXiv Detail & Related papers (2022-06-01T11:00:07Z) - Active Audio-Visual Separation of Dynamic Sound Sources [93.97385339354318]
We propose a reinforcement learning agent equipped with a novel transformer memory that learns motion policies to control its camera and microphone.
We show that our model is able to learn efficient behavior to carry out continuous separation of a time-varying audio target.
arXiv Detail & Related papers (2022-02-02T02:03:28Z) - Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped
Environments with Moving Sounds [5.002862602915434]
Audio-visual navigation combines sight and hearing to navigate to a sound-emitting source in an unmapped environment.
We propose the novel dynamic audio-visual navigation benchmark which requires to catch a moving sound source in an environment with noisy and distracting sounds.
We demonstrate that our approach consistently outperforms the current state-of-the-art by a large margin across all tasks of moving sounds, unheard sounds, and noisy environments.
arXiv Detail & Related papers (2021-11-29T15:17:46Z) - Move2Hear: Active Audio-Visual Source Separation [90.16327303008224]
We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest.
We introduce a reinforcement learning approach that trains movement policies controlling the agent's camera and microphone placement over time.
We demonstrate our model's ability to find minimal movement sequences with maximal payoff for audio source separation.
arXiv Detail & Related papers (2021-05-15T04:58:08Z) - Semantic Audio-Visual Navigation [93.12180578267186]
We introduce semantic audio-visual navigation, where objects in the environment make sounds consistent with their semantic meaning.
We propose a transformer-based model to tackle this new semantic AudioGoal task.
Our method strongly outperforms existing audio-visual navigation methods by learning to associate semantic, acoustic, and visual cues.
arXiv Detail & Related papers (2020-12-21T18:59:04Z) - Learning to Set Waypoints for Audio-Visual Navigation [89.42192208471735]
In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source.
Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations.
We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements.
arXiv Detail & Related papers (2020-08-21T18:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.