Related papers: Policy-Space Search: Equivalences, Improvements, and Compression

Policy-Space Search: Equivalences, Improvements, and Compression

URL: http://arxiv.org/abs/2403.19883v1
Date: Thu, 28 Mar 2024 23:40:20 GMT
Title: Policy-Space Search: Equivalences, Improvements, and Compression
Authors: Frederico Messa, André Grahl Pereira,
Abstract summary: Fully-observable non-deterministic (FOND) planning is at the core of artificial intelligence planning with uncertainty. A* with Non-Determinism (AND*) is a FOND planner that generalizes A* for FOND planning.
Score: 5.801044612920816
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fully-observable non-deterministic (FOND) planning is at the core of artificial intelligence planning with uncertainty. It models uncertainty through actions with non-deterministic effects. A* with Non-Determinism (AND*) (Messa and Pereira, 2023) is a FOND planner that generalizes A* (Hart et al., 1968) for FOND planning. It searches for a solution policy by performing an explicit heuristic search on the policy space of the FOND task. In this paper, we study and improve the performance of the policy-space search performed by AND*. We present a polynomial-time procedure that constructs a solution policy given just the set of states that should be mapped. This procedure, together with a better understanding of the structure of FOND policies, allows us to present three concepts of equivalences between policies. We use policy equivalences to prune part of the policy search space, making AND* substantially more effective in solving FOND tasks. We also study the impact of taking into account structural state-space symmetries to strengthen the detection of equivalence policies and the impact of performing the search with satisficing techniques. We apply a recent technique from the group theory literature to better compute structural state-space symmetries. Finally, we present a solution compressor that, given a policy defined over complete states, finds a policy that unambiguously represents it using the minimum number of partial states. AND* with the introduced techniques generates, on average, two orders of magnitude fewer policies to solve FOND tasks. These techniques allow explicit policy-space search to be competitive in terms of both coverage and solution compactness with other state-of-the-art FOND planners.

Related papers

Convergence of Policy Mirror Descent Beyond Compatible Function Approximation [66.4260157478436]
We develop theoretical PMD general policy classes where we strictly assume a weaker variational dominance and obtain convergence to the best-in-class policy. Our main notion leverages a novel notion induced by the local norm induced by the occupancy- gradient measure.
arXiv Detail & Related papers (2025-02-16T08:05:46Z)
Statistical Analysis of Policy Space Compression Problem [54.1754937830779]
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems. Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process. This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness.
arXiv Detail & Related papers (2024-11-15T02:46:55Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP) We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z)
Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment. We first adopt a transformer-based method to learn policy embeddings. Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z)
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z)
Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes [39.94472154078338]
We propose a unified policy abstraction theory, containing three types of policy abstraction associated to policy features at different levels. We then generalize them to three policy metrics that quantify the distance (i.e., similarity) of policies. For the empirical study, we investigate the efficacy of the proposed policy metrics and representations, in characterizing policy difference and conveying policy generalization respectively.
arXiv Detail & Related papers (2022-09-16T03:41:50Z)
Reward-Free Policy Space Compression for Reinforcement Learning [39.04317877999891]
In reinforcement learning, we encode the potential behaviors of an agent interacting with an environment into an infinite set of policies. We seek for a reward-free compression of the policy space into a finite set of representative policies. We show that this compression of the policy space can be formulated as a set cover problem, and it is inherently NP-hard.
arXiv Detail & Related papers (2022-02-22T18:11:57Z)
Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution [4.920145245773581]
This paper proposes a novel method for diversity-based policy search via Neuroevolution. We use the Quality-Diversity framework which provides a principled approach to policy search. We also use the Jacobian of the inverse-mapping function to guide the search in the representation space.
arXiv Detail & Related papers (2021-04-27T18:52:03Z)
Policy Manifold Search for Improving Diversity-based Neuroevolution [4.920145245773581]
We propose a novel approach to diversity-based policy search via Neuroevolution. Our approach iteratively collects policies according to the Quality-Diversity framework. We use the Jacobian of the inverse transformation to guide the search in the latent space.
arXiv Detail & Related papers (2020-12-15T23:59:49Z)
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints. This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.