Hierarchical Object-to-Zone Graph for Object Navigation
- URL: http://arxiv.org/abs/2109.02066v2
- Date: Thu, 9 Sep 2021 08:59:23 GMT
- Title: Hierarchical Object-to-Zone Graph for Object Navigation
- Authors: Sixian Zhang, Xinhang Song, Yubing Bai, Weijie Li, Yakui Chu, Shuqiang
Jiang
- Abstract summary: In the unseen environment, when the target object is not in egocentric view, the agent may not be able to make wise decisions.
We propose a hierarchical object-to-zone (HOZ) graph to guide the agent in a coarse-to-fine manner.
Online-learning mechanism is also proposed to update HOZ according to the real-time observation in new environments.
- Score: 43.558927774552295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of object navigation is to reach the expected objects according to
visual information in the unseen environments. Previous works usually implement
deep models to train an agent to predict actions in real-time. However, in the
unseen environment, when the target object is not in egocentric view, the agent
may not be able to make wise decisions due to the lack of guidance. In this
paper, we propose a hierarchical object-to-zone (HOZ) graph to guide the agent
in a coarse-to-fine manner, and an online-learning mechanism is also proposed
to update HOZ according to the real-time observation in new environments. In
particular, the HOZ graph is composed of scene nodes, zone nodes and object
nodes. With the pre-learned HOZ graph, the real-time observation and the target
goal, the agent can constantly plan an optimal path from zone to zone. In the
estimated path, the next potential zone is regarded as sub-goal, which is also
fed into the deep reinforcement learning model for action prediction. Our
methods are evaluated on the AI2-Thor simulator. In addition to widely used
evaluation metrics SR and SPL, we also propose a new evaluation metric of SAE
that focuses on the effective action rate. Experimental results demonstrate the
effectiveness and efficiency of our proposed method.
Related papers
- Goal-conditioned Offline Planning from Curious Exploration [28.953718733443143]
We consider the challenge of extracting goal-conditioned behavior from the products of unsupervised exploration techniques.
We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting.
In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme.
arXiv Detail & Related papers (2023-11-28T17:48:18Z) - Probable Object Location (POLo) Score Estimation for Efficient Object
Goal Navigation [15.623723522165731]
We introduce a novel framework centered around the Probable Object Location (POLo) score.
We further enhance the framework's practicality by introducing POLoNet, a neural network trained to approximate the computationally intensive POLo score.
Our experiments, involving the first phase of the OVMM 2023 challenge, demonstrate that an agent equipped with POLoNet significantly outperforms a range of baseline methods.
arXiv Detail & Related papers (2023-11-14T08:45:32Z) - Object-centric Video Representation for Long-term Action Anticipation [33.115854386196126]
Key motivation is that objects provide important cues to recognize and predict human-object interactions.
We propose to build object-centric video representations by leveraging visual-language pretrained models.
To recognize and predict human-object interactions, we use a Transformer-based neural architecture.
arXiv Detail & Related papers (2023-10-31T22:54:31Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - ReVoLT: Relational Reasoning and Voronoi Local Graph Planning for
Target-driven Navigation [1.0896567381206714]
Embodied AI is an inevitable trend that emphasizes the interaction between intelligent entities and the real world.
Recent works focus on exploiting layout relationships by graph neural networks (GNNs)
We decouple this task and propose ReVoLT, a hierarchical framework.
arXiv Detail & Related papers (2023-01-06T05:19:56Z) - Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z) - Online Grounding of PDDL Domains by Acting and Sensing in Unknown
Environments [62.11612385360421]
This paper proposes a framework that allows an agent to perform different tasks.
We integrate machine learning models to abstract the sensory data, symbolic planning for goal achievement and path planning for navigation.
We evaluate the proposed method in accurate simulated environments, where the sensors are RGB-D on-board camera, GPS and compass.
arXiv Detail & Related papers (2021-12-18T21:48:20Z) - Learning Long-term Visual Dynamics with Region Proposal Interaction
Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range.
Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z) - Optimistic Agent: Accurate Graph-Based Value Estimation for More
Successful Visual Navigation [18.519303422753534]
We argue that this ability is largely due to three main reasons: the incorporation of prior knowledge (or experience), the adaptation of it to the new environment using the observed visual cues and optimistically searching without giving up early.
This is currently missing in the state-of-the-art visual navigation methods based on Reinforcement Learning (RL)
In this paper, we propose to use externally learned prior knowledge of the relative object locations and integrate it into our model by constructing a neural graph.
arXiv Detail & Related papers (2020-04-07T09:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.