Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains
- URL: http://arxiv.org/abs/2404.02499v2
- Date: Mon, 13 May 2024 09:20:26 GMT
- Title: Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains
- Authors: Till Hofmann, Hector Geffner,
- Abstract summary: General policies represent reactive strategies for solving large families of planning problems.
We extend the formulations and the resulting methods for learning general policies over fully observable, non-deterministic domains.
- Score: 12.730070122798459
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning general policies over fully observable, non-deterministic (FOND) domains. We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness. The method for learning general policies for FOND planning can actually be seen as an alternative FOND planning method that searches for solutions, not in the given state space but in an abstract space defined by features that must be learned as well.
Related papers
- Landscape of Policy Optimization for Finite Horizon MDPs with General State and Action [10.219627570276689]
We develop a framework for a class of Markov Decision Processes with general state and spaces.
We show that gradient methods converge to the globally optimal policy with a nonasymptomatic condition.
Our result establishes first complexity for multi-period inventory systems.
arXiv Detail & Related papers (2024-09-25T17:56:02Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy
Decomposition [40.851324484481275]
We study off-policy learning of contextual bandit policies in large discrete action spaces.
We propose a novel two-stage algorithm, called Policy Optimization via Two-Stage Policy Decomposition.
We show that POTEC provides substantial improvements in OPL effectiveness particularly in large and structured action spaces.
arXiv Detail & Related papers (2024-02-09T03:01:13Z) - Policy Optimization over General State and Action Spaces [3.722665817361884]
Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging.
We first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces.
We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all.
arXiv Detail & Related papers (2022-11-30T03:44:44Z) - Representation Learning for Continuous Action Spaces is Beneficial for
Efficient Policy Learning [64.14557731665577]
Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL)
In this paper, we propose an efficient policy learning method in latent state and action spaces.
The effectiveness of the proposed method is demonstrated by MountainCar,CarRacing and Cheetah experiments.
arXiv Detail & Related papers (2022-11-23T19:09:37Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Learning and Planning in Complex Action Spaces [19.33000677254158]
We propose a general framework to reason in a principled way about policy evaluation and improvement.
This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm.
We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains.
arXiv Detail & Related papers (2021-04-13T15:48:48Z) - Learning General Policies from Small Examples Without Supervision [18.718037284357834]
Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once.
It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans.
In this work, we introduce an alternative approach for computing more expressive general policies which does not require sample plans or a QNP planner.
arXiv Detail & Related papers (2021-01-03T19:44:13Z) - Policy Evaluation Networks [50.53250641051648]
We introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding.
Our empirical results demonstrate that combining these three elements can produce policies that outperform those that generated the training data.
arXiv Detail & Related papers (2020-02-26T23:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.