Related papers: Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

URL: http://arxiv.org/abs/2303.16563v2
Date: Mon, 4 Dec 2023 14:53:15 GMT
Title: Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks
Authors: Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu
Abstract summary: We study building multi-task agents in open-world environments. We convert the multi-task learning problem into learning basic skills and planning over the skills. Our method accomplishes 40 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills.
Score: 31.084848672383185
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study building multi-task agents in open-world environments. Without human demonstrations, learning to accomplish long-horizon tasks in a large open-world environment with reinforcement learning (RL) is extremely inefficient. To tackle this challenge, we convert the multi-task learning problem into learning basic skills and planning over the skills. Using the popular open-world game Minecraft as the testbed, we propose three types of fine-grained basic skills, and use RL with intrinsic rewards to acquire skills. A novel Finding-skill that performs exploration to find diverse items provides better initialization for other skills, improving the sample efficiency for skill learning. In skill planning, we leverage the prior knowledge in Large Language Models to find the relationships between skills and build a skill graph. When the agent is solving a task, our skill search algorithm walks on the skill graph and generates the proper skill plans for the agent. In experiments, our method accomplishes 40 diverse Minecraft tasks, where many tasks require sequentially executing for more than 10 skills. Our method outperforms baselines by a large margin and is the most sample-efficient demonstration-free RL method to solve Minecraft Tech Tree tasks. The project's website and code can be found at https://sites.google.com/view/plan4mc.

Related papers

Efficient Skill Discovery via Regret-Aware Optimization [37.27136009415794]
We frame skill discovery as a min-max game of skill generation and policy learning.<n>We propose a regret-aware method on top of temporal representation learning.<n>Our method achieves a 15% zero shot improvement in high-dimensional environments.
arXiv Detail & Related papers (2025-06-26T06:45:59Z)
Pretrained Bayesian Non-parametric Knowledge Prior in Robotic Long-Horizon Reinforcement Learning [10.598207472087578]
Reinforcement learning (RL) methods typically learn new tasks from scratch, often disregarding prior knowledge that could accelerate the learning process. This work introduces a method that models potential primitive skill motions as having non-parametric properties with an unknown number of underlying features. We utilize a non-parametric model, specifically Dirichlet Process Mixtures, enhanced with birth and merge, to pre-train a skill prior that effectively captures the diverse nature of skills.
arXiv Detail & Related papers (2025-03-27T20:43:36Z)
SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation [58.14969377419633]
We propose spire, a system that decomposes tasks into smaller learning subproblems and second combines imitation and reinforcement learning to maximize their strengths. We find that spire outperforms prior approaches that integrate imitation learning, reinforcement learning, and planning by 35% to 50% in average task performance.
arXiv Detail & Related papers (2024-10-23T17:42:07Z)
SkillMimic: Learning Basketball Interaction Skills from Demonstrations [85.23012579911378]
We introduce SkillMimic, a unified data-driven framework that fundamentally changes how agents learn interaction skills. Our key insight is that a unified HOI imitation reward can effectively capture the essence of diverse interaction patterns from HOI datasets. For evaluation, we collect and introduce two basketball datasets containing approximately 35 minutes of diverse basketball skills.
arXiv Detail & Related papers (2024-08-12T15:19:04Z)
Agentic Skill Discovery [19.5703917813767]
Language-conditioned robotic skills make it possible to apply the high-level reasoning of Large Language Models (LLMs) to low-level robotic control. A remaining challenge is to acquire a diverse set of fundamental skills. We introduce a novel framework for skill discovery that is entirely driven by LLMs.
arXiv Detail & Related papers (2024-05-23T19:44:03Z)
Choreographer: Learning and Adapting Skills in Imagination [60.09911483010824]
We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination. Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model. Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy.
arXiv Detail & Related papers (2022-11-23T23:31:14Z)
Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics [18.546688182454236]
Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. We propose accelerating exploration in the skill space using state-conditioned generative models. We validate our approach across four challenging manipulation tasks, demonstrating our ability to learn across task variations.
arXiv Detail & Related papers (2022-11-04T02:42:17Z)
Lipschitz-constrained Unsupervised Skill Discovery [91.51219447057817]
Lipschitz-constrained Skill Discovery (LSD) encourages the agent to discover more diverse, dynamic, and far-reaching skills. LSD outperforms previous approaches in terms of skill diversity, state space coverage, and performance on seven downstream tasks.
arXiv Detail & Related papers (2022-02-02T08:29:04Z)
Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks [85.56153200251713]
We introduce EMBR, a model-based RL method for learning primitive skills that are suitable for completing long-horizon visuomotor tasks. On a Franka Emika robot arm, we find that EMBR enables the robot to complete three long-horizon visuomotor tasks at 85% success rate.
arXiv Detail & Related papers (2021-09-21T16:48:07Z)
Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft [18.845438529816004]
We explore curriculum learning in a complex, visual domain with many hard exploration challenges: Minecraft. We find that learning progress is a reliable measure of learnability for automatically constructing an effective curriculum.
arXiv Detail & Related papers (2021-06-28T17:50:40Z)
Discovering Generalizable Skills via Automated Generation of Diverse Tasks [82.16392072211337]
We propose a method to discover generalizable skills via automated generation of a diverse set of tasks. As opposed to prior work on unsupervised discovery of skills, our method pairs each skill with a unique task produced by a trainable task generator. A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective. The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks.
arXiv Detail & Related papers (2021-06-26T03:41:51Z)
Accelerating Reinforcement Learning with Learned Skill Priors [20.268358783821487]
Most modern reinforcement learning approaches learn every task from scratch. One approach for leveraging prior knowledge is to transfer skills learned on prior tasks to the new task. We show that learned skill priors are essential for effective skill transfer from rich datasets.
arXiv Detail & Related papers (2020-10-22T17:59:51Z)
Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks. We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible. We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.