Sound Adversarial Audio-Visual Navigation
- URL: http://arxiv.org/abs/2202.10910v1
- Date: Tue, 22 Feb 2022 14:19:42 GMT
- Title: Sound Adversarial Audio-Visual Navigation
- Authors: Yinfeng Yu, Wenbing Huang, Fuchun Sun, Changan Chen, Yikai Wang,
Xiaohong Liu
- Abstract summary: Existing audio-visual navigation works assume a clean environment that solely contains the target sound.
In this work, we design an acoustically complex environment in which there exists a sound attacker playing a zero-sum game with the agent.
Under certain constraints to the attacker, we can improve the robustness of the agent towards unexpected sound attacks in audio-visual navigation.
- Score: 43.962774217305935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Audio-visual navigation task requires an agent to find a sound source in a
realistic, unmapped 3D environment by utilizing egocentric audio-visual
observations. Existing audio-visual navigation works assume a clean environment
that solely contains the target sound, which, however, would not be suitable in
most real-world applications due to the unexpected sound noise or intentional
interference. In this work, we design an acoustically complex environment in
which, besides the target sound, there exists a sound attacker playing a
zero-sum game with the agent. More specifically, the attacker can move and
change the volume and category of the sound to make the agent suffer from
finding the sounding object while the agent tries to dodge the attack and
navigate to the goal under the intervention. Under certain constraints to the
attacker, we can improve the robustness of the agent towards unexpected sound
attacks in audio-visual navigation. For better convergence, we develop a joint
training mechanism by employing the property of a centralized critic with
decentralized actors. Experiments on two real-world 3D scan datasets, Replica,
and Matterport3D, verify the effectiveness and the robustness of the agent
trained under our designed environment when transferred to the clean
environment or the one containing sound attackers with random policy. Project:
\url{https://yyf17.github.io/SAAVN}.
Related papers
- Sonicverse: A Multisensory Simulation Platform for Embodied Household
Agents that See and Hear [65.33183123368804]
Sonicverse is a multisensory simulation platform with integrated audio-visual simulation.
It enables embodied AI tasks that need audio-visual perception.
An agent trained in Sonicverse can successfully perform audio-visual navigation in real-world environments.
arXiv Detail & Related papers (2023-06-01T17:24:01Z) - Towards Generalisable Audio Representations for Audio-Visual Navigation [18.738943602529805]
In audio-visual navigation (AVN), an intelligent agent needs to navigate to a constantly sound-making object in complex 3D environments.
We propose a contrastive learning-based method to tackle this challenge by regularising the audio encoder.
arXiv Detail & Related papers (2022-06-01T11:00:07Z) - Visual Acoustic Matching [92.91522122739845]
We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment.
Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials.
arXiv Detail & Related papers (2022-02-14T17:05:22Z) - Active Audio-Visual Separation of Dynamic Sound Sources [93.97385339354318]
We propose a reinforcement learning agent equipped with a novel transformer memory that learns motion policies to control its camera and microphone.
We show that our model is able to learn efficient behavior to carry out continuous separation of a time-varying audio target.
arXiv Detail & Related papers (2022-02-02T02:03:28Z) - Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources
in Unmapped 3D Environments [0.0]
We introduce the novel dynamic audio-visual navigation benchmark in which an embodied AI agent must catch a moving sound source in an unmapped environment in the presence of distractors and noisy sounds.
Our approach outperforms the current state-of-the-art with better generalization to unheard sounds and better robustness to noisy scenarios.
arXiv Detail & Related papers (2022-01-12T03:08:03Z) - Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped
Environments with Moving Sounds [5.002862602915434]
Audio-visual navigation combines sight and hearing to navigate to a sound-emitting source in an unmapped environment.
We propose the novel dynamic audio-visual navigation benchmark which requires to catch a moving sound source in an environment with noisy and distracting sounds.
We demonstrate that our approach consistently outperforms the current state-of-the-art by a large margin across all tasks of moving sounds, unheard sounds, and noisy environments.
arXiv Detail & Related papers (2021-11-29T15:17:46Z) - Move2Hear: Active Audio-Visual Source Separation [90.16327303008224]
We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest.
We introduce a reinforcement learning approach that trains movement policies controlling the agent's camera and microphone placement over time.
We demonstrate our model's ability to find minimal movement sequences with maximal payoff for audio source separation.
arXiv Detail & Related papers (2021-05-15T04:58:08Z) - Semantic Audio-Visual Navigation [93.12180578267186]
We introduce semantic audio-visual navigation, where objects in the environment make sounds consistent with their semantic meaning.
We propose a transformer-based model to tackle this new semantic AudioGoal task.
Our method strongly outperforms existing audio-visual navigation methods by learning to associate semantic, acoustic, and visual cues.
arXiv Detail & Related papers (2020-12-21T18:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.