Fast Lifelong Adaptive Inverse Reinforcement Learning from
Demonstrations
- URL: http://arxiv.org/abs/2209.11908v7
- Date: Wed, 12 Apr 2023 14:19:36 GMT
- Title: Fast Lifelong Adaptive Inverse Reinforcement Learning from
Demonstrations
- Authors: Letian Chen, Sravan Jayanthi, Rohan Paleja, Daniel Martin, Viacheslav
Zakharov, Matthew Gombolay
- Abstract summary: We propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR)
We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability.
FLAIR surpasses benchmarks across three control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling.
- Score: 1.6050172226234585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from Demonstration (LfD) approaches empower end-users to teach
robots novel tasks via demonstrations of the desired behaviors, democratizing
access to robotics. However, current LfD frameworks are not capable of fast
adaptation to heterogeneous human demonstrations nor the large-scale deployment
in ubiquitous robotics applications. In this paper, we propose a novel LfD
framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our
approach (1) leverages learned strategies to construct policy mixtures for fast
adaptation to new demonstrations, allowing for quick end-user personalization,
(2) distills common knowledge across demonstrations, achieving accurate task
inference; and (3) expands its model only when needed in lifelong deployments,
maintaining a concise set of prototypical strategies that can approximate all
behaviors via policy mixtures. We empirically validate that FLAIR achieves
adaptability (i.e., the robot adapts to heterogeneous, user-specific task
preferences), efficiency (i.e., the robot achieves sample-efficient
adaptation), and scalability (i.e., the model grows sublinearly with the number
of demonstrations while maintaining high performance). FLAIR surpasses
benchmarks across three control tasks with an average 57% improvement in policy
returns and an average 78% fewer episodes required for demonstration modeling
using policy mixtures. Finally, we demonstrate the success of FLAIR in a table
tennis task and find users rate FLAIR as having higher task (p<.05) and
personalization (p<.05) performance.
Related papers
- DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control [7.626715427413578]
Vision-language-action (VLA) models have shown promise for generalizable robot skills.
Current VLA models often focus on scaling the vision-language model (VLM) component, while the action space representation remains a critical bottleneck.
This paper introduces DexVLA, a novel framework designed to enhance the efficiency and generalization capabilities ofVLAs for complex, long-horizon tasks.
arXiv Detail & Related papers (2025-02-09T11:25:56Z) - Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [50.191655141020505]
We introduce a novel framework for learning world models.
By providing a scalable and robust framework, we pave the way for adaptive and efficient robotic systems in real-world applications.
arXiv Detail & Related papers (2025-01-17T10:39:09Z) - FAST: Efficient Action Tokenization for Vision-Language-Action Models [98.15494168962563]
We propose a new compression-based tokenization scheme for robot actions, based on the discrete cosine transform.
Based on FAST, we release FAST+, a universal robot action tokenizer, trained on 1M real robot action trajectories.
arXiv Detail & Related papers (2025-01-16T18:57:04Z) - GRAPE: Generalizing Robot Policy via Preference Alignment [58.419992317452376]
We present GRAPE: Generalizing Robot Policy via Preference Alignment.
We show GRAPE increases success rates on in-domain and unseen manipulation tasks by 51.79% and 58.20%, respectively.
GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 37.44% and rollout step-length by 11.15%, respectively.
arXiv Detail & Related papers (2024-11-28T18:30:10Z) - FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [74.25049012472502]
FLaRe is a large-scale Reinforcement Learning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques.
Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance on previously demonstrated and on entirely novel tasks and embodiments.
arXiv Detail & Related papers (2024-09-25T03:15:17Z) - EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning [36.0274770291531]
We propose Equibot, a robust, data-efficient, and generalizable approach for robot manipulation task learning.
Our approach combines SIM(3)-equivariant neural network architectures with diffusion models.
We show that our method can easily generalize to novel objects and scenes after learning from just 5 minutes of human demonstrations in each task.
arXiv Detail & Related papers (2024-07-01T17:09:43Z) - Riemannian Flow Matching Policy for Robot Motion Learning [5.724027955589408]
We introduce a novel model for learning and synthesizing robot visuomotor policies.
We show that RFMP provides smoother action trajectories with significantly lower inference times.
arXiv Detail & Related papers (2024-03-15T20:48:41Z) - Meta-Learning with Self-Improving Momentum Target [72.98879709228981]
We propose Self-improving Momentum Target (SiMT) to improve the performance of a meta-learner.
SiMT generates the target model by adapting from the temporal ensemble of the meta-learner.
We show that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods.
arXiv Detail & Related papers (2022-10-11T06:45:15Z) - Strategy Discovery and Mixture in Lifelong Learning from Heterogeneous
Demonstration [1.2891210250935146]
Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors.
In this paper, we propose a novel algorithm, Dynamic Multi-Strategy Reward Distillation (DMSRD), which distills common knowledge between heterogeneous demonstrations.
Our personalized, federated, and lifelong LfD architecture surpasses benchmarks in two continuous control problems with an average 77% improvement in policy returns and 42% improvement in log likelihood.
arXiv Detail & Related papers (2022-02-14T20:10:25Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.