Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous
Demonstration
- URL: http://arxiv.org/abs/2202.07014v1
- Date: Mon, 14 Feb 2022 20:10:25 GMT
- Title: Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous
Demonstration
- Authors: Sravan Jayanthi, Letian Chen, Matthew Gombolay
- Abstract summary: Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors.
In this paper, we propose a novel algorithm, Dynamic Multi-Strategy Reward Distillation (DMSRD), which distills common knowledge between heterogeneous demonstrations.
Our personalized, federated, and lifelong LfD architecture surpasses benchmarks in two continuous control problems with an average 77% improvement in policy returns and 42% improvement in log likelihood.
- Score: 1.2891210250935146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from Demonstration (LfD) approaches empower end-users to teach
robots novel tasks via demonstrations of the desired behaviors, democratizing
access to robotics. A key challenge in LfD research is that users tend to
provide heterogeneous demonstrations for the same task due to various
strategies and preferences. Therefore, it is essential to develop LfD
algorithms that ensure \textit{flexibility} (the robot adapts to personalized
strategies), \textit{efficiency} (the robot achieves sample-efficient
adaptation), and \textit{scalability} (robot reuses a concise set of strategies
to represent a large amount of behaviors). In this paper, we propose a novel
algorithm, Dynamic Multi-Strategy Reward Distillation (DMSRD), which distills
common knowledge between heterogeneous demonstrations, leverages learned
strategies to construct mixture policies, and continues to improve by learning
from all available data. Our personalized, federated, and lifelong LfD
architecture surpasses benchmarks in two continuous control problems with an
average 77\% improvement in policy returns and 42\% improvement in log
likelihood, alongside stronger task reward correlation and more precise
strategy rewards.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - Enhancing Robotic Navigation: An Evaluation of Single and
Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal.
By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Fast Lifelong Adaptive Inverse Reinforcement Learning from
Demonstrations [1.6050172226234585]
We propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR)
We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability.
FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling.
arXiv Detail & Related papers (2022-09-24T02:48:02Z) - Weakly Supervised Disentangled Representation for Goal-conditioned
Reinforcement Learning [15.698612710580447]
We propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization.
In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation.
We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization.
arXiv Detail & Related papers (2022-02-28T09:05:14Z) - Automatic Goal Generation using Dynamical Distance Learning [5.797847756967884]
Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment.
In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging.
We propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion.
arXiv Detail & Related papers (2021-11-07T16:23:56Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z) - Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning
Systems [0.8223798883838329]
This research investigates how to integrate human interaction modalities to the reinforcement learning loop.
Results show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms.
arXiv Detail & Related papers (2020-08-30T17:28:18Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.