Reinforcement Learning for Classical Planning: Viewing Heuristics as
Dense Reward Generators
- URL: http://arxiv.org/abs/2109.14830v1
- Date: Thu, 30 Sep 2021 03:36:01 GMT
- Title: Reinforcement Learning for Classical Planning: Viewing Heuristics as
Dense Reward Generators
- Authors: Clement Gehring, Masataro Asai, Rohan Chitnis, Tom Silver, Leslie Pack
Kaelbling, Shirin Sohrabi, Michael Katz
- Abstract summary: We propose to leverage domain-independent functions commonly used in the classical planning literature to improve the sample efficiency of RL.
These classicals act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals.
We demonstrate on several classical planning domains that using classical logics for RL allows for good sample efficiency compared to sparse-reward RL.
- Score: 54.6441336539206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in reinforcement learning (RL) have led to a growing interest
in applying RL to classical planning domains or applying classical planning
methods to some complex RL domains. However, the long-horizon goal-based
problems found in classical planning lead to sparse rewards for RL, making
direct application inefficient. In this paper, we propose to leverage
domain-independent heuristic functions commonly used in the classical planning
literature to improve the sample efficiency of RL. These classical heuristics
act as dense reward generators to alleviate the sparse-rewards issue and enable
our RL agent to learn domain-specific value functions as residuals on these
heuristics, making learning easier. Correct application of this technique
requires consolidating the discounted metric used in RL and the non-discounted
metric used in heuristics. We implement the value functions using Neural Logic
Machines, a neural network architecture designed for grounded first-order logic
inputs. We demonstrate on several classical planning domains that using
classical heuristics for RL allows for good sample efficiency compared to
sparse-reward RL. We further show that our learned value functions generalize
to novel problem instances in the same domain.
Related papers
- Reinforcement Learning for Dynamic Memory Allocation [0.0]
We present a framework in which an RL agent continuously learns from interactions with the system to improve memory management tactics.
Our results show that RL can successfully train agents that can match and surpass traditional allocation strategies.
We also explore the potential of history-aware policies that leverage previous allocation requests to enhance the allocator's ability to handle complex request patterns.
arXiv Detail & Related papers (2024-10-20T20:13:46Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards.
Many supervised and unsupervised RL problems are not covered in the Linear RL framework.
We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - Scalable Deep Reinforcement Learning Algorithms for Mean Field Games [60.550128966505625]
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents.
Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods.
Existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values.
We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm.
The second one is an online mixing method based on
arXiv Detail & Related papers (2022-03-22T18:10:32Z) - Computational Benefits of Intermediate Rewards for Hierarchical Planning [42.579256546135866]
We show that using intermediate rewards reduces the computational complexity in finding a successful policy but does not guarantee to find the shortest path.
We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and other popular deep RL algorithms.
arXiv Detail & Related papers (2021-07-08T16:39:13Z) - Heuristic-Guided Reinforcement Learning [31.056460162389783]
Tabula rasa RL algorithms require environment interactions or computation that scales with the horizon of the decision-making task.
Our framework can be viewed as a horizon-based regularization for controlling bias and variance in RL under a finite interaction budget.
In particular, we introduce the novel concept of an "improvable" -- a that allows an RL agent to extrapolate beyond its prior knowledge.
arXiv Detail & Related papers (2021-06-05T00:04:09Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.