Heuristic-Guided Reinforcement Learning
- URL: http://arxiv.org/abs/2106.02757v1
- Date: Sat, 5 Jun 2021 00:04:09 GMT
- Title: Heuristic-Guided Reinforcement Learning
- Authors: Ching-An Cheng, Andrey Kolobov, Adith Swaminathan
- Abstract summary: Tabula rasa RL algorithms require environment interactions or computation that scales with the horizon of the decision-making task.
Our framework can be viewed as a horizon-based regularization for controlling bias and variance in RL under a finite interaction budget.
In particular, we introduce the novel concept of an "improvable" -- a that allows an RL agent to extrapolate beyond its prior knowledge.
- Score: 31.056460162389783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a framework for accelerating reinforcement learning (RL)
algorithms by heuristics constructed from domain knowledge or offline data.
Tabula rasa RL algorithms require environment interactions or computation that
scales with the horizon of the sequential decision-making task. Using our
framework, we show how heuristic-guided RL induces a much shorter-horizon
subproblem that provably solves the original task. Our framework can be viewed
as a horizon-based regularization for controlling bias and variance in RL under
a finite interaction budget. On the theoretical side, we characterize
properties of a good heuristic and its impact on RL acceleration. In
particular, we introduce the novel concept of an "improvable heuristic" -- a
heuristic that allows an RL agent to extrapolate beyond its prior knowledge. On
the empirical side, we instantiate our framework to accelerate several
state-of-the-art algorithms in simulated robotic control tasks and procedurally
generated games. Our framework complements the rich literature on warm-starting
RL with expert demonstrations or exploratory datasets, and introduces a
principled method for injecting prior knowledge into RL.
Related papers
- The Virtues of Pessimism in Inverse Reinforcement Learning [38.98656220917943]
Inverse Reinforcement Learning is a powerful framework for learning complex behaviors from expert demonstrations.
It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL.
We consider an alternative approach to speeding up the RL in IRL: emphpessimism, i.e., staying close to the expert's data distribution, instantiated via the use of offline RL algorithms.
arXiv Detail & Related papers (2024-02-04T21:22:29Z) - Towards an Information Theoretic Framework of Context-Based Offline
Meta-Reinforcement Learning [50.976910714839065]
Context-based OMRL (COMRL) as a popular paradigm, aims to learn a universal policy conditioned on effective task representations.
We show that COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $boldsymbolM$ and its latent representation $boldsymbolZ$ by implementing various approximate bounds.
Based on the theoretical insight and the information bottleneck principle, we arrive at a novel algorithm dubbed UNICORN, which exhibits remarkable generalization across a broad spectrum of RL benchmarks.
arXiv Detail & Related papers (2024-02-04T09:58:42Z) - A User Study on Explainable Online Reinforcement Learning for Adaptive
Systems [0.802904964931021]
Online reinforcement learning (RL) is increasingly used for realizing adaptive systems in the presence of design time uncertainty.
Deep RL gaining interest, the learned knowledge is no longer explicitly represented, but is represented as a neural network.
XRL-DINE provides visual insights into why certain decisions were made at important time points.
arXiv Detail & Related papers (2023-07-09T05:12:42Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - A Survey on Explainable Reinforcement Learning: Concepts, Algorithms,
Challenges [38.70863329476517]
Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal.
Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential.
To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability.
arXiv Detail & Related papers (2022-11-12T13:52:06Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - POAR: Efficient Policy Optimization via Online Abstract State
Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states.
We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations.
We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.