Goal-Guided Transformer-Enabled Reinforcement Learning for Efficient
Autonomous Navigation
- URL: http://arxiv.org/abs/2301.00362v2
- Date: Sun, 24 Sep 2023 15:25:02 GMT
- Title: Goal-Guided Transformer-Enabled Reinforcement Learning for Efficient
Autonomous Navigation
- Authors: Wenhui Huang, Yanxin Zhou, Xiangkun He, and Chen Lv
- Abstract summary: We present a Goal-guided Transformer-enabled reinforcement learning (GTRL) approach for goal-driven navigation.
Our approach motivates the scene representation to concentrate mainly on goal-relevant features, which substantially enhances the data efficiency of the DRL learning process.
Both simulation and real-world experimental results manifest the superiority of our approach in terms of data efficiency, performance, robustness, and sim-to-real generalization.
- Score: 15.501449762687148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite some successful applications of goal-driven navigation, existing deep
reinforcement learning (DRL)-based approaches notoriously suffers from poor
data efficiency issue. One of the reasons is that the goal information is
decoupled from the perception module and directly introduced as a condition of
decision-making, resulting in the goal-irrelevant features of the scene
representation playing an adversary role during the learning process. In light
of this, we present a novel Goal-guided Transformer-enabled reinforcement
learning (GTRL) approach by considering the physical goal states as an input of
the scene encoder for guiding the scene representation to couple with the goal
information and realizing efficient autonomous navigation. More specifically,
we propose a novel variant of the Vision Transformer as the backbone of the
perception system, namely Goal-guided Transformer (GoT), and pre-train it with
expert priors to boost the data efficiency. Subsequently, a reinforcement
learning algorithm is instantiated for the decision-making system, taking the
goal-oriented scene representation from the GoT as the input and generating
decision commands. As a result, our approach motivates the scene representation
to concentrate mainly on goal-relevant features, which substantially enhances
the data efficiency of the DRL learning process, leading to superior navigation
performance. Both simulation and real-world experimental results manifest the
superiority of our approach in terms of data efficiency, performance,
robustness, and sim-to-real generalization, compared with other
state-of-the-art (SOTA) baselines. The demonstration video
(https://www.youtube.com/watch?v=aqJCHcsj4w0) and the source code
(https://github.com/OscarHuangWind/DRL-Transformer-SimtoReal-Navigation) are
also provided.
Related papers
- Causality-Aware Transformer Networks for Robotic Navigation [13.719643934968367]
Current research in Visual Navigation reveals opportunities for improvement.
Direct adoption of RNNs and Transformers often overlooks the specific differences between Embodied AI and traditional sequential data modelling.
We propose Causality-Aware Transformer (CAT) Networks for Navigation, featuring a Causal Understanding Module.
arXiv Detail & Related papers (2024-09-04T12:53:26Z) - Intrinsic Dynamics-Driven Generalizable Scene Representations for Vision-Oriented Decision-Making Applications [0.21051221444478305]
How to improve the ability of scene representation is a key issue in vision-oriented decision-making applications.
We propose an intrinsic dynamics-driven representation learning method with sequence models in visual reinforcement learning.
arXiv Detail & Related papers (2024-05-30T06:31:03Z) - Vision-and-Language Navigation Generative Pretrained Transformer [0.0]
Vision-and-Language Navigation Generative Pretrained Transformer (VLN-GPT)
Adopts transformer decoder model (GPT2) to model trajectory sequence dependencies, bypassing the need for historical encoding modules.
Performance assessments on the VLN dataset reveal that VLN-GPT surpasses complex state-of-the-art encoder-based models.
arXiv Detail & Related papers (2024-05-27T09:42:04Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation [61.08389704326803]
Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes.
Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates.
We propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability.
arXiv Detail & Related papers (2023-03-28T08:00:46Z) - Learning Action-Effect Dynamics for Hypothetical Vision-Language
Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions.
We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z) - Unsupervised Domain Adaptation for Video Transformers in Action
Recognition [76.31442702219461]
We propose a simple and novel UDA approach for video action recognition.
Our approach builds a robust source model that better generalises to target domain.
We report results on two video action benchmarks recognition for UDA.
arXiv Detail & Related papers (2022-07-26T12:17:39Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - MaAST: Map Attention with Semantic Transformersfor Efficient Visual
Navigation [4.127128889779478]
This work focuses on performing better or comparable to the existing learning-based solutions for visual navigation for autonomous agents.
We propose a method to encode vital scene semantics into a semantically informed, top-down egocentric map representation.
We conduct experiments on 3-D reconstructed indoor PointGoal visual navigation and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-03-21T12:01:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.