Explore before Moving: A Feasible Path Estimation and Memory Recalling
Framework for Embodied Navigation
- URL: http://arxiv.org/abs/2110.08571v1
- Date: Sat, 16 Oct 2021 13:30:55 GMT
- Title: Explore before Moving: A Feasible Path Estimation and Memory Recalling
Framework for Embodied Navigation
- Authors: Yang Wu, Shirui Feng, Guanbin Li, Liang Lin
- Abstract summary: We focus on the navigation and solve the problem of existing navigation algorithms lacking experience and common sense.
Inspired by the human ability to think twice before moving and conceive several feasible paths to seek a goal in unfamiliar scenes, we present a route planning method named Path Estimation and Memory Recalling framework.
We show strong experimental results of PEMR on the EmbodiedQA navigation task.
- Score: 117.26891277593205
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: An embodied task such as embodied question answering (EmbodiedQA), requires
an agent to explore the environment and collect clues to answer a given
question that related with specific objects in the scene. The solution of such
task usually includes two stages, a navigator and a visual Q&A module. In this
paper, we focus on the navigation and solve the problem of existing navigation
algorithms lacking experience and common sense, which essentially results in a
failure finding target when robot is spawn in unknown environments.
Inspired by the human ability to think twice before moving and conceive
several feasible paths to seek a goal in unfamiliar scenes, we present a route
planning method named Path Estimation and Memory Recalling (PEMR) framework.
PEMR includes a "looking ahead" process, \textit{i.e.} a visual feature
extractor module that estimates feasible paths for gathering 3D navigational
information, which is mimicking the human sense of direction. PEMR contains
another process ``looking behind'' process that is a memory recall mechanism
aims at fully leveraging past experience collected by the feature extractor.
Last but not the least, to encourage the navigator to learn more accurate prior
expert experience, we improve the original benchmark dataset and provide a
family of evaluation metrics for diagnosing both navigation and question
answering modules. We show strong experimental results of PEMR on the
EmbodiedQA navigation task.
Related papers
- Hierarchical end-to-end autonomous navigation through few-shot waypoint detection [0.0]
Human navigation is facilitated through the association of actions with landmarks.
Current autonomous navigation schemes rely on accurate positioning devices and algorithms as well as extensive streams of sensory data collected from the environment.
We propose a hierarchical end-to-end meta-learning scheme that enables a mobile robot to navigate in a previously unknown environment.
arXiv Detail & Related papers (2024-09-23T00:03:39Z) - Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation [11.667940255053582]
This paper uses the RGB and depth information of the training scene to pretrain the feature extractor, which improves navigation efficiency.
We evaluated our method on AI2-Thor and RoboTHOR and demonstrated that it significantly outperforms state-of-the-art (SOTA) methods on success rate and navigation efficiency.
arXiv Detail & Related papers (2024-06-20T08:35:10Z) - Explore until Confident: Efficient Exploration for Embodied Question Answering [32.27111287314288]
We leverage the strong semantic reasoning capabilities of large vision-language models to efficiently explore and answer questions.
We propose a method that first builds a semantic map of the scene based on depth information and via visual prompting of a VLM.
Next, we use conformal prediction to calibrate the VLM's question answering confidence, allowing the robot to know when to stop exploration.
arXiv Detail & Related papers (2024-03-23T22:04:03Z) - Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation [88.84058353659107]
Instance ImageGoal Navigation (IIN) aims to navigate to a specified object depicted by a goal image in an unexplored environment.
We propose a new modular navigation framework named Instance-aware Exploration-Verification-Exploitation (IEVE) for instance-level image goal navigation.
Our method surpasses previous state-of-the-art work, with a classical segmentation model (0.684 vs. 0.561 success) or a robust model (0.702 vs. 0.561 success)
arXiv Detail & Related papers (2024-02-25T07:59:10Z) - JPerceiver: Joint Perception Network for Depth, Pose and Layout
Estimation in Driving Scenes [75.20435924081585]
JPerceiver can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence.
It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO.
Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks.
arXiv Detail & Related papers (2022-07-16T10:33:59Z) - Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation.
This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z) - Structured Scene Memory for Vision-Language Navigation [155.63025602722712]
We propose a crucial architecture for vision-language navigation (VLN)
It is compartmentalized enough to accurately memorize the percepts during navigation.
It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment.
arXiv Detail & Related papers (2021-03-05T03:41:00Z) - Active Visual Information Gathering for Vision-Language Navigation [115.40768457718325]
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
One of the key challenges in VLN is how to conduct a robust navigation by mitigating the uncertainty caused by ambiguous instructions and insufficient observation of the environment.
This work draws inspiration from human navigation behavior and endows an agent with an active information gathering ability for a more intelligent VLN policy.
arXiv Detail & Related papers (2020-07-15T23:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.