Related papers: Augmenting Policy Learning with Routines Discovered from a Demonstration

Augmenting Policy Learning with Routines Discovered from a Demonstration

URL: http://arxiv.org/abs/2012.12469v3
Date: Fri, 9 Apr 2021 10:11:00 GMT
Title: Augmenting Policy Learning with Routines Discovered from a Demonstration
Authors: Zelin Zhao, Chuang Gan, Jiajun Wu, Xiaoxiao Guo, Joshua B. Tenenbaum
Abstract summary: We propose routine-augmented policy learning (RAPL) RAPL discovers routines composed of primitive actions from a single demonstration. We show that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C.
Score: 86.9307760606403
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Humans can abstract prior knowledge from very little data and use it to boost skill learning. In this paper, we propose routine-augmented policy learning (RAPL), which discovers routines composed of primitive actions from a single demonstration and uses discovered routines to augment policy learning. To discover routines from the demonstration, we first abstract routine candidates by identifying grammar over the demonstrated action trajectory. Then, the best routines measured by length and frequency are selected to form a routine library. We propose to learn policy simultaneously at primitive-level and routine-level with discovered routines, leveraging the temporal structure of routines. Our approach enables imitating expert behavior at multiple temporal scales for imitation learning and promotes reinforcement learning exploration. Extensive experiments on Atari games demonstrate that RAPL improves the state-of-the-art imitation learning method SQIL and reinforcement learning method A2C. Further, we show that discovered routines can generalize to unseen levels and difficulties on the CoinRun benchmark.

Related papers

Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z)
Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions. By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z)
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control [55.81022882408587]
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. We propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. We introduce an approach that combines continuous action quantization with byte pair encoding to learn powerful action abstractions.
arXiv Detail & Related papers (2024-02-16T04:55:09Z)
You Only Live Once: Single-Life Reinforcement Learning [124.1738675154651]
In many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial. We formalize this problem setting, where an agent must complete a task within a single episode without interventions. We propose an algorithm, $Q$-weighted adversarial learning (QWALE), which employs a distribution matching strategy.
arXiv Detail & Related papers (2022-10-17T09:00:11Z)
Learning Dense Reward with Temporal Variant Self-Supervision [5.131840233837565]
Complex real-world robotic applications lack explicit and informative descriptions that can directly be used as rewards. Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations. This paper proposes a more efficient and robust way of sampling and learning.
arXiv Detail & Related papers (2022-05-20T20:30:57Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
Lifelong Learning from Event-based Data [22.65311698505554]
We investigate methods for learning from data produced by event cameras. We propose a model that is composed of both, feature extraction and continuous learning.
arXiv Detail & Related papers (2021-11-11T17:59:41Z)
Demonstration-Guided Reinforcement Learning with Learned Skills [23.376115889936628]
Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors. In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL. We propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations.
arXiv Detail & Related papers (2021-07-21T17:59:34Z)
Learning Invariant Representation for Continual Learning [5.979373021392084]
A key challenge in Continual learning is catastrophically forgetting previously learned tasks when the agent faces a new one. We propose a new pseudo-rehearsal-based method, named learning Invariant Representation for Continual Learning (IRCL) Disentangling the shared invariant representation helps to learn continually a sequence of tasks, while being more robust to forgetting and having better knowledge transfer.
arXiv Detail & Related papers (2021-01-15T15:12:51Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.