Multi-goal Audio-visual Navigation using Sound Direction Map
- URL: http://arxiv.org/abs/2308.00219v1
- Date: Tue, 1 Aug 2023 01:26:55 GMT
- Title: Multi-goal Audio-visual Navigation using Sound Direction Map
- Authors: Haru Kondoh and Asako Kanezaki
- Abstract summary: We propose a new framework for multi-goal audio-visual navigation.
The research shows that multi-goal audio-visual navigation has the difficulty of the implicit need to separate the sources of sound.
We propose a method named sound direction map (SDM), which dynamically localizes multiple sound sources in a learning-based manner.
- Score: 10.152838128195468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past few years, there has been a great deal of research on
navigation tasks in indoor environments using deep reinforcement learning
agents. Most of these tasks use only visual information in the form of
first-person images to navigate to a single goal. More recently, tasks that
simultaneously use visual and auditory information to navigate to the sound
source and even navigation tasks with multiple goals instead of one have been
proposed. However, there has been no proposal for a generalized navigation task
combining these two types of tasks and using both visual and auditory
information in a situation where multiple sound sources are goals. In this
paper, we propose a new framework for this generalized task: multi-goal
audio-visual navigation. We first define the task in detail, and then we
investigate the difficulty of the multi-goal audio-visual navigation task
relative to the current navigation tasks by conducting experiments in various
situations. The research shows that multi-goal audio-visual navigation has the
difficulty of the implicit need to separate the sources of sound. Next, to
mitigate the difficulties in this new task, we propose a method named sound
direction map (SDM), which dynamically localizes multiple sound sources in a
learning-based manner while making use of past memories. Experimental results
show that the use of SDM significantly improves the performance of multiple
baseline methods, regardless of the number of goals.
Related papers
- SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments [14.179677726976056]
SayNav is a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks.
SayNav achieves state-of-the-art results and even outperforms an oracle based baseline with strong ground-truth assumptions by more than 8% in terms of success rate.
arXiv Detail & Related papers (2023-09-08T02:24:37Z) - Towards Versatile Embodied Navigation [120.73460380993305]
Vienna is a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model.
We empirically demonstrate that, compared with learning each visual navigation task individually, our agent achieves comparable or even better performance with reduced complexity.
arXiv Detail & Related papers (2022-10-30T11:53:49Z) - AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments [60.98664330268192]
We present AVLEN -- an interactive agent for Audio-Visual-Language Embodied Navigation.
The goal of AVLEN is to localize an audio event via navigating the 3D visual world.
To realize these abilities, AVLEN uses a multimodal hierarchical reinforcement learning backbone.
arXiv Detail & Related papers (2022-10-14T16:35:06Z) - Towards Generalisable Audio Representations for Audio-Visual Navigation [18.738943602529805]
In audio-visual navigation (AVN), an intelligent agent needs to navigate to a constantly sound-making object in complex 3D environments.
We propose a contrastive learning-based method to tackle this challenge by regularising the audio encoder.
arXiv Detail & Related papers (2022-06-01T11:00:07Z) - Zero Experience Required: Plug & Play Modular Transfer Learning for
Semantic Visual Navigation [97.17517060585875]
We present a unified approach to visual navigation using a novel modular transfer learning model.
Our model can effectively leverage its experience from one source task and apply it to multiple target tasks.
Our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.
arXiv Detail & Related papers (2022-02-05T00:07:21Z) - Explore before Moving: A Feasible Path Estimation and Memory Recalling
Framework for Embodied Navigation [117.26891277593205]
We focus on the navigation and solve the problem of existing navigation algorithms lacking experience and common sense.
Inspired by the human ability to think twice before moving and conceive several feasible paths to seek a goal in unfamiliar scenes, we present a route planning method named Path Estimation and Memory Recalling framework.
We show strong experimental results of PEMR on the EmbodiedQA navigation task.
arXiv Detail & Related papers (2021-10-16T13:30:55Z) - MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation [23.877609358505268]
Recent work shows that map-like memory is useful for long-horizon navigation tasks.
We propose the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment.
We examine how a variety of agent models perform across a spectrum of navigation task complexities.
arXiv Detail & Related papers (2020-12-07T18:42:38Z) - Learning to Set Waypoints for Audio-Visual Navigation [89.42192208471735]
In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source.
Existing models learn to act at a fixed granularity of agent motion and rely on simple recurrent aggregations of the audio observations.
We introduce a reinforcement learning approach to audio-visual navigation with two key novel elements.
arXiv Detail & Related papers (2020-08-21T18:00:33Z) - Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment.
This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.