Reset-Free Lifelong Learning with Skill-Space Planning
- URL: http://arxiv.org/abs/2012.03548v2
- Date: Fri, 1 Jan 2021 10:49:54 GMT
- Title: Reset-Free Lifelong Learning with Skill-Space Planning
- Authors: Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch
- Abstract summary: We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL.
LiSP learns the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model.
We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments.
- Score: 105.00539596788127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The objective of lifelong reinforcement learning (RL) is to optimize agents
which can continuously adapt and interact in changing environments. However,
current RL approaches fail drastically when environments are non-stationary and
interactions are non-episodic. We propose Lifelong Skill Planning (LiSP), an
algorithmic framework for non-episodic lifelong RL based on planning in an
abstract space of higher-order skills. We learn the skills in an unsupervised
manner using intrinsic rewards and plan over the learned skills using a learned
dynamics model. Moreover, our framework permits skill discovery even from
offline data, thereby reducing the need for excessive real-world interactions.
We demonstrate empirically that LiSP successfully enables long-horizon planning
and learns agents that can avoid catastrophic failures even in challenging
non-stationary and non-episodic environments derived from gridworld and MuJoCo
benchmarks.
Related papers
- Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs [12.572869123617783]
Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks.
PbRL presents a pioneering framework that capitalizes on human preferences as pivotal reward signals.
We propose a LLM-enabled automatic preference generation framework named LLM4PG.
arXiv Detail & Related papers (2024-06-28T04:21:24Z) - EXTRACT: Efficient Policy Learning by Extracting Transferable Robot Skills from Offline Data [22.471559284344462]
Most reinforcement learning (RL) methods focus on learning optimal policies over low-level action spaces.
While these methods can perform well in their training environments, they lack the flexibility to transfer to new tasks.
We demonstrate through experiments in sparse, image-based, robot manipulation environments that can more quickly learn new tasks than prior works.
arXiv Detail & Related papers (2024-06-25T17:50:03Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Boosting Offline Reinforcement Learning for Autonomous Driving with
Hierarchical Latent Skills [37.31853034449015]
We present a skill-based framework that enhances offline RL to overcome the long-horizon vehicle planning challenge.
Specifically, we design a variational autoencoder (VAE) to learn skills from offline demonstrations.
To mitigate posterior collapse of common VAEs, we introduce a two-branch sequence encoder to capture both discrete options and continuous variations of the complex driving skills.
arXiv Detail & Related papers (2023-09-24T11:51:17Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Continuous Coordination As a Realistic Scenario for Lifelong Learning [6.044372319762058]
We introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings.
We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation.
We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.
arXiv Detail & Related papers (2021-03-04T18:44:03Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.