Fast Lifelong Adaptive Inverse Reinforcement Learning from
Demonstrations
- URL: http://arxiv.org/abs/2209.11908v7
- Date: Wed, 12 Apr 2023 14:19:36 GMT
- Title: Fast Lifelong Adaptive Inverse Reinforcement Learning from
Demonstrations
- Authors: Letian Chen, Sravan Jayanthi, Rohan Paleja, Daniel Martin, Viacheslav
Zakharov, Matthew Gombolay
- Abstract summary: We propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR)
We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability.
FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling.
- Score: 1.6050172226234585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from Demonstration (LfD) approaches empower end-users to teach
robots novel tasks via demonstrations of the desired behaviors, democratizing
access to robotics. However, current LfD frameworks are not capable of fast
adaptation to heterogeneous human demonstrations nor the large-scale deployment
in ubiquitous robotics applications. In this paper, we propose a novel LfD
framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our
approach (1) leverages learned strategies to construct policy mixtures for fast
adaptation to new demonstrations, allowing for quick end-user personalization,
(2) distills common knowledge across demonstrations, achieving accurate task
inference; and (3) expands its model only when needed in lifelong deployments,
maintaining a concise set of prototypical strategies that can approximate all
behaviors via policy mixtures. We empirically validate that FLAIR achieves
adaptability (i.e., the robot adapts to heterogeneous, user-specific task
preferences), efficiency (i.e., the robot achieves sample-efficient
adaptation), and scalability (i.e., the model grows sublinearly with the number
of demonstrations while maintaining high performance). FLAIR surpasses
benchmarks across three control tasks with an average 57% improvement in policy
returns and an average 78% fewer episodes required for demonstration modeling
using policy mixtures. Finally, we demonstrate the success of FLAIR in a table
tennis task and find users rate FLAIR as having higher task (p<.05) and
personalization (p<.05) performance.
Related papers
- One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [80.71541671907426]
OneStep Diffusion Policy (OneDP) is a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator.
OneDP significantly accelerates response times for robotic control tasks.
arXiv Detail & Related papers (2024-10-28T17:54:31Z) - FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [74.25049012472502]
FLaRe is a large-scale Reinforcement Learning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques.
Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance on previously demonstrated and on entirely novel tasks and embodiments.
arXiv Detail & Related papers (2024-09-25T03:15:17Z) - EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning [36.0274770291531]
We propose Equibot, a robust, data-efficient, and generalizable approach for robot manipulation task learning.
Our approach combines SIM(3)-equivariant neural network architectures with diffusion models.
We show that our method can easily generalize to novel objects and scenes after learning from just 5 minutes of human demonstrations in each task.
arXiv Detail & Related papers (2024-07-01T17:09:43Z) - Riemannian Flow Matching Policy for Robot Motion Learning [5.724027955589408]
We introduce a novel model for learning and synthesizing robot visuomotor policies.
We show that RFMP provides smoother action trajectories with significantly lower inference times.
arXiv Detail & Related papers (2024-03-15T20:48:41Z) - Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous
Demonstration [1.2891210250935146]
Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors.
In this paper, we propose a novel algorithm, Dynamic Multi-Strategy Reward Distillation (DMSRD), which distills common knowledge between heterogeneous demonstrations.
Our personalized, federated, and lifelong LfD architecture surpasses benchmarks in two continuous control problems with an average 77% improvement in policy returns and 42% improvement in log likelihood.
arXiv Detail & Related papers (2022-02-14T20:10:25Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z) - Learning from Suboptimal Demonstration via Self-Supervised Reward
Regression [1.2891210250935146]
Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration.
Modern LfD techniques, e.g. inverse reinforcement learning (IRL), assume users provide at least optimalally optimal demonstrations.
We show these approaches make incorrect assumptions and thus suffer from brittle, degraded performance.
We present a physical demonstration of teaching a robot a topspin strike in table tennis achieves 32% faster returns and 40% more topspin than user demonstration.
arXiv Detail & Related papers (2020-10-17T04:18:04Z) - Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic
Reinforcement Learning [109.77163932886413]
We show how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning.
This adaptation uses less than 0.2% of the data necessary to learn the task from scratch.
We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning.
arXiv Detail & Related papers (2020-04-21T17:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.