Related papers: Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

URL: http://arxiv.org/abs/2109.14830v1
Date: Thu, 30 Sep 2021 03:36:01 GMT
Title: Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators
Authors: Clement Gehring, Masataro Asai, Rohan Chitnis, Tom Silver, Leslie Pack Kaelbling, Shirin Sohrabi, Michael Katz
Abstract summary: We propose to leverage domain-independent functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classicals act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals. We demonstrate on several classical planning domains that using classical logics for RL allows for good sample efficiency compared to sparse-reward RL.
Score: 54.6441336539206
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals on these heuristics, making learning easier. Correct application of this technique requires consolidating the discounted metric used in RL and the non-discounted metric used in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain.

Related papers

Towards General-Purpose Model-Free Reinforcement Learning [40.973429772093155]
Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice, RL algorithms are often tailored to specific benchmarks. We propose a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings.
arXiv Detail & Related papers (2025-01-27T15:36:37Z)
Reinforcement Learning for Dynamic Memory Allocation [0.0]
We present a framework in which an RL agent continuously learns from interactions with the system to improve memory management tactics. Our results show that RL can successfully train agents that can match and surpass traditional allocation strategies. We also explore the potential of history-aware policies that leverage previous allocation requests to enhance the allocator's ability to handle complex request patterns.
arXiv Detail & Related papers (2024-10-20T20:13:46Z)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL. We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z)
Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories. We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z)
Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards. Many supervised and unsupervised RL problems are not covered in the Linear RL framework. We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z)
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences. Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer. We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z)
Scalable Deep Reinforcement Learning Algorithms for Mean Field Games [60.550128966505625]
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. Existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on
arXiv Detail & Related papers (2022-03-22T18:10:32Z)
Computational Benefits of Intermediate Rewards for Hierarchical Planning [42.579256546135866]
We show that using intermediate rewards reduces the computational complexity in finding a successful policy but does not guarantee to find the shortest path. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and other popular deep RL algorithms.
arXiv Detail & Related papers (2021-07-08T16:39:13Z)
Heuristic-Guided Reinforcement Learning [31.056460162389783]
Tabula rasa RL algorithms require environment interactions or computation that scales with the horizon of the decision-making task. Our framework can be viewed as a horizon-based regularization for controlling bias and variance in RL under a finite interaction budget. In particular, we introduce the novel concept of an "improvable" -- a that allows an RL agent to extrapolate beyond its prior knowledge.
arXiv Detail & Related papers (2021-06-05T00:04:09Z)
RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL) By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code. We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.