Sample Efficient Reinforcement Learning by Automatically Learning to
Compose Subtasks
- URL: http://arxiv.org/abs/2401.14226v1
- Date: Thu, 25 Jan 2024 15:06:40 GMT
- Title: Sample Efficient Reinforcement Learning by Automatically Learning to
Compose Subtasks
- Authors: Shuai Han, Mehdi Dastani, Shihan Wang
- Abstract summary: We propose an RL algorithm that automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks.
We evaluate our algorithm in a variety of sparse-reward environments.
- Score: 3.1594865504808944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving sample efficiency is central to Reinforcement Learning (RL),
especially in environments where the rewards are sparse. Some recent approaches
have proposed to specify reward functions as manually designed or learned
reward structures whose integrations in the RL algorithms are claimed to
significantly improve the learning efficiency. Manually designed reward
structures can suffer from inaccuracy and existing automatically learning
methods are often computationally intractable for complex tasks. The
integration of inaccurate or partial reward structures in RL algorithms fail to
learn optimal policies. In this work, we propose an RL algorithm that can
automatically structure the reward function for sample efficiency, given a set
of labels that signify subtasks. Given such minimal knowledge about the task,
we train a high-level policy that selects optimal sub-tasks in each state
together with a low-level policy that efficiently learns to complete each
sub-task. We evaluate our algorithm in a variety of sparse-reward environments.
The experiment results show that our approach significantly outperforms the
state-of-art baselines as the difficulty of the task increases.
Related papers
- Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping [16.5526277899717]
This study aims to design a multi-agent cooperative algorithm with logic reward shaping.
Experiments have been conducted on various types of tasks in the Minecraft-like environment.
arXiv Detail & Related papers (2024-11-02T09:03:23Z) - Reward-Machine-Guided, Self-Paced Reinforcement Learning [30.42334205249944]
We develop a self-paced reinforcement learning algorithm guided by reward machines.
The proposed algorithm achieves optimal behavior reliably even in cases in which existing baselines cannot make any meaningful progress.
It also decreases the curriculum length and reduces the variance in the curriculum generation process by up to one-fourth and four orders of magnitude, respectively.
arXiv Detail & Related papers (2023-05-25T22:13:37Z) - Human-Inspired Framework to Accelerate Reinforcement Learning [1.6317061277457001]
Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency.
This paper introduces a novel human-inspired framework to enhance RL algorithm sample efficiency.
arXiv Detail & Related papers (2023-02-28T13:15:04Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - CLUTR: Curriculum Learning via Unsupervised Task Representation Learning [130.79246770546413]
CLUTR is a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization.
We show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments.
arXiv Detail & Related papers (2022-10-19T01:45:29Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Model-based Adversarial Meta-Reinforcement Learning [38.28304764312512]
We propose Model-based Adversarial Meta-Reinforcement Learning (AdMRL)
AdMRL aims to minimize the worst-case sub-optimality gap across all tasks in a family of tasks.
We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks.
arXiv Detail & Related papers (2020-06-16T02:21:49Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.