Augmenting Policy Learning with Routines Discovered from a Demonstration
- URL: http://arxiv.org/abs/2012.12469v3
- Date: Fri, 9 Apr 2021 10:11:00 GMT
- Title: Augmenting Policy Learning with Routines Discovered from a Demonstration
- Authors: Zelin Zhao, Chuang Gan, Jiajun Wu, Xiaoxiao Guo, Joshua B. Tenenbaum
- Abstract summary: We propose routine-augmented policy learning (RAPL)
RAPL discovers routines composed of primitive actions from a single demonstration.
We show that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C.
- Score: 86.9307760606403
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Humans can abstract prior knowledge from very little data and use it to boost
skill learning. In this paper, we propose routine-augmented policy learning
(RAPL), which discovers routines composed of primitive actions from a single
demonstration and uses discovered routines to augment policy learning. To
discover routines from the demonstration, we first abstract routine candidates
by identifying grammar over the demonstrated action trajectory. Then, the best
routines measured by length and frequency are selected to form a routine
library. We propose to learn policy simultaneously at primitive-level and
routine-level with discovered routines, leveraging the temporal structure of
routines. Our approach enables imitating expert behavior at multiple temporal
scales for imitation learning and promotes reinforcement learning exploration.
Extensive experiments on Atari games demonstrate that RAPL improves the
state-of-the-art imitation learning method SQIL and reinforcement learning
method A2C. Further, we show that discovered routines can generalize to unseen
levels and difficulties on the CoinRun benchmark.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control [55.81022882408587]
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making.
We propose a novel view that treats inducing temporal action abstractions as a sequence compression problem.
We introduce an approach that combines continuous action quantization with byte pair encoding to learn powerful action abstractions.
arXiv Detail & Related papers (2024-02-16T04:55:09Z) - You Only Live Once: Single-Life Reinforcement Learning [124.1738675154651]
In many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial.
We formalize this problem setting, where an agent must complete a task within a single episode without interventions.
We propose an algorithm, $Q$-weighted adversarial learning (QWALE), which employs a distribution matching strategy.
arXiv Detail & Related papers (2022-10-17T09:00:11Z) - Learning Dense Reward with Temporal Variant Self-Supervision [5.131840233837565]
Complex real-world robotic applications lack explicit and informative descriptions that can directly be used as rewards.
Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations.
This paper proposes a more efficient and robust way of sampling and learning.
arXiv Detail & Related papers (2022-05-20T20:30:57Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Lifelong Learning from Event-based Data [22.65311698505554]
We investigate methods for learning from data produced by event cameras.
We propose a model that is composed of both, feature extraction and continuous learning.
arXiv Detail & Related papers (2021-11-11T17:59:41Z) - Demonstration-Guided Reinforcement Learning with Learned Skills [23.376115889936628]
Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors.
In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL.
We propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations.
arXiv Detail & Related papers (2021-07-21T17:59:34Z) - Learning Invariant Representation for Continual Learning [5.979373021392084]
A key challenge in Continual learning is catastrophically forgetting previously learned tasks when the agent faces a new one.
We propose a new pseudo-rehearsal-based method, named learning Invariant Representation for Continual Learning (IRCL)
Disentangling the shared invariant representation helps to learn continually a sequence of tasks, while being more robust to forgetting and having better knowledge transfer.
arXiv Detail & Related papers (2021-01-15T15:12:51Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.