RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in
Dynamic Environments via Language-Based Feedback
- URL: http://arxiv.org/abs/2303.07622v2
- Date: Mon, 18 Sep 2023 02:18:27 GMT
- Title: RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in
Dynamic Environments via Language-Based Feedback
- Authors: Souradip Chakraborty, Kasun Weerakoon, Prithvi Poddar, Mohamed Elnoor,
Priya Narayanan, Carl Busart, Pratap Tokekar, Amrit Singh Bedi, and Dinesh
Manocha
- Abstract summary: Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment.
We propose a novel approach called RE-MOVE to adapt already trained policy to real-time changes in the environment without re-training via utilizing a language-based feedback.
- Score: 56.219221064727016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning-based policies for continuous control robotic
navigation tasks often fail to adapt to changes in the environment during
real-time deployment, which may result in catastrophic failures. To address
this limitation, we propose a novel approach called RE-MOVE (REquest help and
MOVE on) to adapt already trained policy to real-time changes in the
environment without re-training via utilizing a language-based feedback. The
proposed approach essentially boils down to addressing two main challenges of
(1) when to ask for feedback and, if received, (2) how to incorporate feedback
into trained policies. RE-MOVE incorporates an epistemic uncertainty-based
framework to determine the optimal time to request instructions-based feedback.
For the second challenge, we employ a zero-shot learning natural language
processing (NLP) paradigm with efficient, prompt design and leverage
state-of-the-art GPT-3.5, Llama-2 language models. To show the efficacy of the
proposed approach, we performed extensive synthetic and real-world evaluations
in several test-time dynamic navigation scenarios. Utilizing RE-MOVE result in
up to 80% enhancement in the attainment of successful goals, coupled with a
reduction of 13.50% in the normalized trajectory length, as compared to
alternative approaches, particularly in demanding real-world environments with
perceptual challenges.
Related papers
- VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model [34.98047665907545]
We propose an environment-free RL framework that decouples value estimation from policy optimization.
The framework operates in two stages: (1) pretraining VEM to estimate long-term action utilities and (2) guiding policy exploration with frozen VEM signals.
evaluated on Android-in-the-Wild benchmarks, VEM achieves state-of-the-art performance in both offline and online settings.
arXiv Detail & Related papers (2025-02-26T07:52:02Z) - Training a Generally Curious Agent [86.84089201249104]
We present PAPRIKA, a fine-tuning approach that enables language models to develop general decision-making capabilities.
Experimental results show that models fine-tuned with PAPRIKA can effectively transfer their learned decision-making capabilities to entirely unseen tasks.
These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems.
arXiv Detail & Related papers (2025-02-24T18:56:58Z) - Policy Learning with a Natural Language Action Space: A Causal Approach [24.096991077437146]
This paper introduces a novel causal framework for multi-stage decision-making in natural language action spaces.
Our approach employs Q-learning to estimate Dynamic Treatment Regimes (DTR) through a single model.
A key technical contribution of our approach is a decoding strategy that translates optimized embeddings back into coherent natural language.
arXiv Detail & Related papers (2025-02-24T17:26:07Z) - COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping [56.907940167333656]
Occluded robot grasping is where the desired grasp poses are kinematically infeasible due to environmental constraints such as surface collisions.
Traditional robot manipulation approaches struggle with the complexity of non-prehensile or bimanual strategies commonly used by humans.
We introduce Constraint-based Manipulation for Bimanual Occluded Grasping (COMBO-Grasp), a learning-based approach which leverages two coordinated policies.
arXiv Detail & Related papers (2025-02-12T01:31:01Z) - Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation [49.43094200366251]
We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition.
Our method, Policy Adaptation via Language Optimization (PALO), combines a handful of demonstrations of a task with proposed language decompositions.
We find that PALO is able of consistently complete long-horizon, multi-tier tasks in the real world, outperforming state of the art pre-trained generalist policies.
arXiv Detail & Related papers (2024-08-29T03:03:35Z) - Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments [13.163784646113214]
Continual Test-Time Adaptation (CTTA) has recently emerged as a promising technique to gradually adapt a source-trained model to continually changing target domains.
We present AMROD, featuring three core components. Firstly, the object-level contrastive learning module extracts object-level features for contrastive learning to refine the feature representation in the target domain.
Secondly, the adaptive monitoring module dynamically skips unnecessary adaptation and updates the category-specific threshold based on predicted confidence scores to enable efficiency and improve the quality of pseudo-labels.
arXiv Detail & Related papers (2024-06-24T08:30:03Z) - A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning [3.1515473193934778]
Off-dynamics Reinforcement Learning seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics.
We propose an innovative approach inspired by recent advancements in Imitation Learning and conservative RL algorithms.
arXiv Detail & Related papers (2023-12-24T13:09:08Z) - CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation [73.78984332354636]
CorNav is a novel zero-shot framework for vision-and-language navigation.
It incorporates environmental feedback for refining future plans and adjusting its actions.
It consistently outperforms all baselines in a zero-shot multi-task setting.
arXiv Detail & Related papers (2023-06-17T11:44:04Z) - Relative Policy-Transition Optimization for Fast Policy Transfer [18.966619060222634]
We consider the problem of policy transfer between two Markov Decision Processes (MDPs)
We propose two new algorithms referred to as Relative Policy Optimization (RPO) and Relative Transition Optimization (RTO)
RPO transfers the policy evaluated in one environment to maximize the return in another, while RTO updates the parameterized dynamics model to reduce the gap between the dynamics of the two environments.
arXiv Detail & Related papers (2022-06-13T09:55:04Z) - Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
Translation [75.86581380817464]
A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write.
This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF)
Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances.
arXiv Detail & Related papers (2022-03-22T23:33:18Z) - Visual-Language Navigation Pretraining via Prompt-based Environmental
Self-exploration [83.96729205383501]
We introduce prompt-based learning to achieve fast adaptation for language embeddings.
Our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE.
arXiv Detail & Related papers (2022-03-08T11:01:24Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Adapting to Dynamic LEO-B5G Systems: Meta-Critic Learning Based
Efficient Resource Scheduling [38.733584547351796]
We address two practical issues for an over-loaded LEO-terrestrial system.
The first challenge is how to efficiently schedule resources to serve the massive number of connected users.
The second challenge is how to make the algorithmic solution more resilient in adapting to dynamic wireless environments.
arXiv Detail & Related papers (2021-10-13T15:21:38Z) - Sim-to-Real Transfer with Incremental Environment Complexity for
Reinforcement Learning of Depth-Based Robot Navigation [1.290382979353427]
Soft-Actor Critic (SAC) training strategy using incremental environment complexity is proposed to drastically reduce the need for additional training in the real world.
The application addressed is depth-based mapless navigation, where a mobile robot should reach a given waypoint in a cluttered environment with no prior mapping information.
arXiv Detail & Related papers (2020-04-30T10:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.