Object Memory Transformer for Object Goal Navigation
- URL: http://arxiv.org/abs/2203.14708v1
- Date: Thu, 24 Mar 2022 09:16:56 GMT
- Title: Object Memory Transformer for Object Goal Navigation
- Authors: Rui Fukushima, Kei Ota, Asako Kanezaki, Yoko Sasaki, Yusuke Yoshiyasu
- Abstract summary: This paper presents a reinforcement learning method for object goal navigation (Nav)
An agent navigates in 3D indoor environments to reach a target object based on long-term observations of objects and scenes.
To the best of our knowledge, this is the first work that uses a long-term memory of object semantics in a goal-oriented navigation task.
- Score: 10.359616364592075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a reinforcement learning method for object goal
navigation (ObjNav) where an agent navigates in 3D indoor environments to reach
a target object based on long-term observations of objects and scenes. To this
end, we propose Object Memory Transformer (OMT) that consists of two key ideas:
1) Object-Scene Memory (OSM) that enables to store long-term scenes and object
semantics, and 2) Transformer that attends to salient objects in the sequence
of previously observed scenes and objects stored in OSM. This mechanism allows
the agent to efficiently navigate in the indoor environment without prior
knowledge about the environments, such as topological maps or 3D meshes. To the
best of our knowledge, this is the first work that uses a long-term memory of
object semantics in a goal-oriented navigation task. Experimental results
conducted on the AI2-THOR dataset show that OMT outperforms previous approaches
in navigating in unknown environments. In particular, we show that utilizing
the long-term object semantics information improves the efficiency of
navigation.
Related papers
- Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments [44.6372390798904]
We propose a new task denominated Personalized Instance-based Navigation (PIN), in which an embodied agent is tasked with locating and reaching a specific personal object.
In each episode, the target object is presented to the agent using two modalities: a set of visual reference images on a neutral background and manually annotated textual descriptions.
arXiv Detail & Related papers (2024-10-23T18:01:09Z) - SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation [83.4599149936183]
Existing zero-shot object navigation methods prompt LLM with the text of spatially closed objects.
We propose to represent the observed scene with 3D scene graph.
We conduct extensive experiments on MP3D, HM3D and RoboTHOR environments, where SG-Nav surpasses previous state-of-the-art zero-shot methods by more than 10% SR on all benchmarks.
arXiv Detail & Related papers (2024-10-10T17:57:19Z) - Prioritized Semantic Learning for Zero-shot Instance Navigation [2.537056548731396]
We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training.
We propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents.
Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot ObjectNav in terms of success rate and is also superior on the new InstanceNav task.
arXiv Detail & Related papers (2024-03-18T10:45:50Z) - Right Place, Right Time! Towards ObjectNav for Non-Stationary Goals [55.581423861790945]
We present a novel approach to tackle the ObjectNav task for non-stationary and potentially occluded targets in an indoor environment.
We present its formulation, feasibility, and a navigation benchmark using a novel memory-enhanced LLM-based policy.
arXiv Detail & Related papers (2024-03-14T22:33:22Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - A Contextual Bandit Approach for Learning to Plan in Environments with
Probabilistic Goal Configurations [20.15854546504947]
We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects.
Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty.
We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.
arXiv Detail & Related papers (2022-11-29T15:48:54Z) - Object Goal Navigation using Data Regularized Q-Learning [9.65323691689801]
Object Goal Navigation requires a robot to find and navigate to an instance of a target object class in a previously unseen environment.
Our framework incrementally builds a semantic map of the environment over time, and then repeatedly selects a long-term goal.
Long-term goal selection is formulated as a vision-based deep reinforcement learning problem.
arXiv Detail & Related papers (2022-08-27T13:26:30Z) - MeMOT: Multi-Object Tracking with Memory [97.48960039220823]
Our model, called MeMOT, consists of three main modules that are all Transformer-based.
MeMOT observes very competitive performance on widely adopted MOT datasets.
arXiv Detail & Related papers (2022-03-31T02:33:20Z) - Object Goal Navigation using Goal-Oriented Semantic Exploration [98.14078233526476]
This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments.
We propose a modular system called, Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently.
arXiv Detail & Related papers (2020-07-01T17:52:32Z) - ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to
Objects [119.46959413000594]
This document summarizes the consensus recommendations of a working group on ObjectNav.
We make recommendations on subtle but important details of evaluation criteria.
We provide a detailed description of the instantiation of these recommendations in challenges organized at the Embodied AI workshop at CVPR 2020.
arXiv Detail & Related papers (2020-06-23T17:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.