Related papers: Landmark Policy Optimization for Object Navigation Task

Landmark Policy Optimization for Object Navigation Task

URL: http://arxiv.org/abs/2109.09512v1
Date: Fri, 17 Sep 2021 12:28:46 GMT
Title: Landmark Policy Optimization for Object Navigation Task
Authors: Aleksey Staroverov, Aleksandr I. Panov
Abstract summary: This work studies object goal navigation task, which involves navigating to the closest object related to the given semantic category in unseen environments. Recent works have shown significant achievements both in the end-to-end Reinforcement Learning approach and modular systems, but need a big step forward to be robust and optimal. We propose a hierarchical method that incorporates standard task formulation and additional area knowledge as landmarks, with a way to extract these landmarks.
Score: 77.34726150561087
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work studies object goal navigation task, which involves navigating to the closest object related to the given semantic category in unseen environments. Recent works have shown significant achievements both in the end-to-end Reinforcement Learning approach and modular systems, but need a big step forward to be robust and optimal. We propose a hierarchical method that incorporates standard task formulation and additional area knowledge as landmarks, with a way to extract these landmarks. In a hierarchy, a low level consists of separately trained algorithms to the most intuitive skills, and a high level decides which skill is needed at this moment. With all proposed solutions, we achieve a 0.75 success rate in a realistic Habitat simulator. After a small stage of additional model training in a reconstructed virtual area at a simulator, we successfully confirmed our results in a real-world case.

Related papers

SemNav: A Model-Based Planner for Zero-Shot Object Goal Navigation Using Vision-Foundation Models [10.671262416557704]
Vision Foundation Models (VFMs) offer powerful capabilities for visual understanding and reasoning.<n>We present a zero-shot object goal navigation framework that integrates the perceptual strength of VFMs with a model-based planner.<n>We evaluate our approach on the HM3D dataset using the Habitat simulator and demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-06-04T03:04:54Z)
Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies. Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors. We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction [19.59151245929067]
We study whether giving an agent an object-centric mapping (describing a set of items and their attributes) allow for more efficient learning. We find this problem is best solved hierarchically by modelling items at a higher level of state abstraction to pixels. We make use of this to propose a fully model-based algorithm that learns a discriminative world model.
arXiv Detail & Related papers (2024-08-21T17:59:31Z)
Embodied Instruction Following in Unknown Environments [64.57388036567461]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.<n>We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.<n>For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z)
Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation [11.510823733292519]
We propose a data-driven, modular-based approach, trained on a dataset that incorporates common-sense knowledge of object-to-room relationships extracted from a large language model. The results in the Habitat simulator demonstrate that our framework outperforms the baseline by an average of 10.6% in the efficiency metric, Success weighted by Path Length (SPL).
arXiv Detail & Related papers (2024-03-21T06:32:36Z)
Multi-Object Navigation in real environments using hybrid policies [18.52681391843433]
We introduce a hybrid navigation method, which decomposes the problem into two different skills. We show the advantages of this approach compared to end-to-end methods both in simulation and a real environment.
arXiv Detail & Related papers (2024-01-24T20:41:25Z)
How To Not Train Your Dragon: Training-free Embodied Object Goal Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI. Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework. Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z)
Navigating to Objects in the Real World [76.1517654037993]
We present a large-scale empirical study of semantic visual navigation methods comparing methods from classical, modular, and end-to-end learning approaches. We find that modular learning works well in the real world, attaining a 90% success rate. In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality.
arXiv Detail & Related papers (2022-12-02T01:10:47Z)
Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning [120.38381203153159]
Reinforcement learning can train policies that effectively perform complex tasks. For long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills. We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill.
arXiv Detail & Related papers (2021-11-04T22:46:16Z)
Efficient Robotic Object Search via HIEM: Hierarchical Policy Learning with Intrinsic-Extrinsic Modeling [33.89793938441333]
We present a novel policy learning paradigm for the object search task, based on hierarchical and interpretable modeling with an intrinsic-extrinsic reward setting. Experiments conducted on the House3D environment validate and show that the robot, trained with our model, can perform the object search task in a more optimal and interpretable way.
arXiv Detail & Related papers (2020-10-16T19:21:38Z)
Object Goal Navigation using Goal-Oriented Semantic Exploration [98.14078233526476]
This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments. We propose a modular system called, Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently.
arXiv Detail & Related papers (2020-07-01T17:52:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.