Efficient Reinforcement Learning from Demonstration Using Local Ensemble
and Reparameterization with Split and Merge of Expert Policies
- URL: http://arxiv.org/abs/2205.11019v1
- Date: Mon, 23 May 2022 03:36:24 GMT
- Title: Efficient Reinforcement Learning from Demonstration Using Local Ensemble
and Reparameterization with Split and Merge of Expert Policies
- Authors: Yu Wang, Fang Liu
- Abstract summary: Policy learned from sub-optimal demonstrations may mislead an agent with incorrect or non-local action decisions.
We propose a new method called Local Ensemble and Re parameterization with Split and Merge of expert policies (LEARN-SAM) to improve efficiency and make better use of the sub-optimal demonstrations.
We demonstrate the superiority of the LEARN-SAM method and its robustness with varying demonstration quality and sparsity in six experiments on complex continuous control problems of low to high dimensions.
- Score: 7.126594773940676
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The current work on reinforcement learning (RL) from demonstrations often
assumes the demonstrations are samples from an optimal policy, an unrealistic
assumption in practice. When demonstrations are generated by sub-optimal
policies or have sparse state-action pairs, policy learned from sub-optimal
demonstrations may mislead an agent with incorrect or non-local action
decisions. We propose a new method called Local Ensemble and Reparameterization
with Split and Merge of expert policies (LEARN-SAM) to improve efficiency and
make better use of the sub-optimal demonstrations. First, LEARN-SAM employs a
new concept, the lambda-function, based on a discrepancy measure between the
current state to demonstrated states to "localize" the weights of the expert
policies during learning. Second, LEARN-SAM employs a split-and-merge (SAM)
mechanism by separating the helpful parts in each expert demonstration and
regrouping them into new expert policies to use the demonstrations selectively.
Both the lambda-function and SAM mechanism help boost the learning speed.
Theoretically, we prove the invariant property of reparameterized policy before
and after the SAM mechanism, providing theoretical guarantees for the
convergence of the employed policy gradient method. We demonstrate the
superiority of the LEARN-SAM method and its robustness with varying
demonstration quality and sparsity in six experiments on complex continuous
control problems of low to high dimensions, compared to existing methods on RL
from demonstration.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning [54.40927310957792]
We introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team.
These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements.
We propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate.
arXiv Detail & Related papers (2024-03-13T20:11:20Z) - Dr.ICL: Demonstration-Retrieved In-context Learning [29.142262267850704]
In-context learning (ICL) teaching a large language model to perform a task with few-shot demonstrations has emerged as a strong paradigm for using LLMs.
Recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance.
This work expands the applicability of retrieval-based ICL approaches by demonstrating that even simple word-overlap similarity measures such as BM25 outperform randomly selected demonstrations.
arXiv Detail & Related papers (2023-05-23T14:55:25Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - DEFENDER: DTW-Based Episode Filtering Using Demonstrations for Enhancing
RL Safety [0.0]
We propose a task-agnostic method that leverages small sets of safe and unsafe demonstrations to improve the safety of RL agents during learning.
We evaluate our method on three tasks from OpenAI Gym's Mujoco benchmark and two state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-05-08T14:23:27Z) - Exploiting Symmetry and Heuristic Demonstrations in Off-policy
Reinforcement Learning for Robotic Manipulation [1.7901837062462316]
This paper aims to define and incorporate the natural symmetry present in physical robotic environments.
The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle.
A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.
arXiv Detail & Related papers (2023-04-12T11:38:01Z) - D-Shape: Demonstration-Shaped Reinforcement Learning via Goal
Conditioning [48.57484755946714]
D-Shape is a new method for combining imitation learning (IL) and reinforcement learning (RL)
This paper introduces D-Shape, a new method for combining IL and RL that uses ideas from reward shaping and goal-conditioned RL to resolve the above conflict.
We experimentally validate D-Shape in sparse-reward gridworld domains, showing that it both improves over RL in terms of sample efficiency and converges consistently to the optimal policy.
arXiv Detail & Related papers (2022-10-26T02:28:32Z) - Robust Learning from Observation with Model Misspecification [33.92371002674386]
Imitation learning (IL) is a popular paradigm for training policies in robotic systems.
We propose a robust IL algorithm to learn policies that can effectively transfer to the real environment without fine-tuning.
arXiv Detail & Related papers (2022-02-12T07:04:06Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Learning Robust State Abstractions for Hidden-Parameter Block MDPs [55.31018404591743]
We leverage ideas of common structure from the HiP-MDP setting to enable robust state abstractions inspired by Block MDPs.
We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings.
arXiv Detail & Related papers (2020-07-14T17:25:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.