Auxiliary Tasks and Exploration Enable ObjectNav
- URL: http://arxiv.org/abs/2104.04112v1
- Date: Thu, 8 Apr 2021 23:03:21 GMT
- Title: Auxiliary Tasks and Exploration Enable ObjectNav
- Authors: Joel Ye, Dhruv Batra, Abhishek Das, and Erik Wijmans
- Abstract summary: We re-enable a generic learned agent by adding auxiliary learning tasks and an exploration reward.
Our agents achieve 24.5% success and 8.1% SPL, a 37% and 8% relative improvement over prior state-of-the-art, respectively.
- Score: 48.314102158070874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ObjectGoal Navigation (ObjectNav) is an embodied task wherein agents are to
navigate to an object instance in an unseen environment. Prior works have shown
that end-to-end ObjectNav agents that use vanilla visual and recurrent modules,
e.g. a CNN+RNN, perform poorly due to overfitting and sample inefficiency. This
has motivated current state-of-the-art methods to mix analytic and learned
components and operate on explicit spatial maps of the environment. We instead
re-enable a generic learned agent by adding auxiliary learning tasks and an
exploration reward. Our agents achieve 24.5% success and 8.1% SPL, a 37% and 8%
relative improvement over prior state-of-the-art, respectively, on the Habitat
ObjectNav Challenge. From our analysis, we propose that agents will act to
simplify their visual inputs so as to smooth their RNN dynamics, and that
auxiliary tasks reduce overfitting by minimizing effective RNN dimensionality;
i.e. a performant ObjectNav agent that must maintain coherent plans over long
horizons does so by learning smooth, low-dimensional recurrent dynamics. Site:
https://joel99.github.io/objectnav/
Related papers
- Prioritized Semantic Learning for Zero-shot Instance Navigation [2.537056548731396]
We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training.
We propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents.
Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot ObjectNav in terms of success rate and is also superior on the new InstanceNav task.
arXiv Detail & Related papers (2024-03-18T10:45:50Z) - Right Place, Right Time! Towards ObjectNav for Non-Stationary Goals [55.581423861790945]
We present a novel approach to tackle the ObjectNav task for non-stationary and potentially occluded targets in an indoor environment.
We present its formulation, feasibility, and a navigation benchmark using a novel memory-enhanced LLM-based policy.
arXiv Detail & Related papers (2024-03-14T22:33:22Z) - Language-Based Augmentation to Address Shortcut Learning in Object Goal
Navigation [0.0]
We aim to deepen our understanding of shortcut learning in ObjectNav.
We observe poor generalization of a state-of-the-art (SOTA) ObjectNav method to environments where this is not the case.
We find that shortcut learning is the root cause: the agent learns to navigate to target objects, by simply searching for the associated wall color of the target object's room.
arXiv Detail & Related papers (2024-02-07T18:44:27Z) - SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments [14.179677726976056]
SayNav is a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks.
SayNav achieves state-of-the-art results and even outperforms an oracle based baseline with strong ground-truth assumptions by more than 8% in terms of success rate.
arXiv Detail & Related papers (2023-09-08T02:24:37Z) - OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav [62.32806118504701]
We present a single neural network architecture that achieves state-of-art results on both the ImageNav and ObjectNav tasks.
Such general-purpose methods offer advantages of simplicity in design, positive scaling with available compute, and versatile applicability to multiple tasks.
arXiv Detail & Related papers (2023-03-14T11:15:37Z) - Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions.
By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z) - ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in
Dynamic Environments [85.81157224163876]
We combine Vision-and-Language Navigation, assembling of collected objects, and object referring expression comprehension, to create a novel joint navigation-and-assembly task, named ArraMon.
During this task, the agent is asked to find and collect different target objects one-by-one by navigating based on natural language instructions in a complex, realistic outdoor environment.
We present results for several baseline models (integrated and biased) and metrics (nDTW, CTC, rPOD, and PTC), and the large model-human performance gap demonstrates that our task is challenging and presents a wide scope for future work.
arXiv Detail & Related papers (2020-11-15T23:30:36Z) - Exploiting Scene-specific Features for Object Goal Navigation [9.806910643086043]
We introduce a new reduced dataset that speeds up the training of navigation models.
Our proposed dataset permits the training of models that do not exploit online-built maps in reasonable times.
We propose the SMTSC model, an attention-based model capable of exploiting the correlation between scenes and objects contained in them.
arXiv Detail & Related papers (2020-08-21T10:16:01Z) - Improving Target-driven Visual Navigation with Attention on 3D Spatial
Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes.
Our proposed method combines visual features and 3D spatial representations to learn navigation policy.
Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.