Related papers: Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks

Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks

URL: http://arxiv.org/abs/2401.14226v1
Date: Thu, 25 Jan 2024 15:06:40 GMT
Title: Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks
Authors: Shuai Han, Mehdi Dastani, Shihan Wang
Abstract summary: We propose an RL algorithm that automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks. We evaluate our algorithm in a variety of sparse-reward environments.
Score: 3.1594865504808944
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Improving sample efficiency is central to Reinforcement Learning (RL), especially in environments where the rewards are sparse. Some recent approaches have proposed to specify reward functions as manually designed or learned reward structures whose integrations in the RL algorithms are claimed to significantly improve the learning efficiency. Manually designed reward structures can suffer from inaccuracy and existing automatically learning methods are often computationally intractable for complex tasks. The integration of inaccurate or partial reward structures in RL algorithms fail to learn optimal policies. In this work, we propose an RL algorithm that can automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks. Given such minimal knowledge about the task, we train a high-level policy that selects optimal sub-tasks in each state together with a low-level policy that efficiently learns to complete each sub-task. We evaluate our algorithm in a variety of sparse-reward environments. The experiment results show that our approach significantly outperforms the state-of-art baselines as the difficulty of the task increases.

Related papers

Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks [2.3031174164121127]
We propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by formulas. We develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process. Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms.
arXiv Detail & Related papers (2024-12-14T18:04:18Z)
Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping [16.5526277899717]
This study aims to design a multi-agent cooperative algorithm with logic reward shaping. Experiments have been conducted on various types of tasks in the Minecraft-like environment.
arXiv Detail & Related papers (2024-11-02T09:03:23Z)
Reward-Machine-Guided, Self-Paced Reinforcement Learning [30.42334205249944]
We develop a self-paced reinforcement learning algorithm guided by reward machines. The proposed algorithm achieves optimal behavior reliably even in cases in which existing baselines cannot make any meaningful progress. It also decreases the curriculum length and reduces the variance in the curriculum generation process by up to one-fourth and four orders of magnitude, respectively.
arXiv Detail & Related papers (2023-05-25T22:13:37Z)
Human-Inspired Framework to Accelerate Reinforcement Learning [1.6317061277457001]
Reinforcement learning (RL) is crucial for data science decision-making but suffers from sample inefficiency. This paper introduces a novel human-inspired framework to enhance RL algorithm sample efficiency.
arXiv Detail & Related papers (2023-02-28T13:15:04Z)
Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning. The algorithm selects the order of tasks that provide the fastest learning for agents. We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z)
CLUTR: Curriculum Learning via Unsupervised Task Representation Learning [130.79246770546413]
CLUTR is a novel curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. We show CLUTR outperforms PAIRED, a principled and popular UED method, in terms of generalization and sample efficiency in the challenging CarRacing and navigation environments.
arXiv Detail & Related papers (2022-10-19T01:45:29Z)
Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks. We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy. We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z)
The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function. We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Model-based Adversarial Meta-Reinforcement Learning [38.28304764312512]
We propose Model-based Adversarial Meta-Reinforcement Learning (AdMRL) AdMRL aims to minimize the worst-case sub-optimality gap across all tasks in a family of tasks. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks.
arXiv Detail & Related papers (2020-06-16T02:21:49Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.