Preliminary Results on Using Abstract AND-OR Graphs for Generalized
Solving of Stochastic Shortest Path Problems
- URL: http://arxiv.org/abs/2204.04301v1
- Date: Fri, 8 Apr 2022 21:30:47 GMT
- Title: Preliminary Results on Using Abstract AND-OR Graphs for Generalized
Solving of Stochastic Shortest Path Problems
- Authors: Rushang Karia, Rashmeet Kaur Nayyar, Siddharth Srivastava
- Abstract summary: Shortest Path Problems (SSPs) are goal-oriented problems in the real-world.
A key difficulty for computing solutions for SSPs is finding solutions to even moderately sized problems intractable.
We show that our approach can be embedded in any SSP solver to compute hierarchically optimal policies.
- Score: 25.152899734616298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several goal-oriented problems in the real-world can be naturally expressed
as Stochastic Shortest Path Problems (SSPs). However, a key difficulty for
computing solutions for problems in the SSP framework is that the computational
requirements often make finding solutions to even moderately sized problems
intractable. Solutions to many of such problems can often be expressed as
generalized policies that are quite easy to compute from small examples and are
readily applicable to problems with a larger number of objects and/or different
object names. In this paper, we provide a preliminary study on using canonical
abstractions to compute such generalized policies and represent them as AND-OR
graphs that translate to simple non-deterministic, memoryless controllers. Such
policy structures naturally lend themselves to a hierarchical approach for
solving problems and we show that our approach can be embedded in any SSP
solver to compute hierarchically optimal policies. We conducted an empirical
evaluation on some well-known planning benchmarks and difficult robotics
domains and show that our approach is promising, often computing optimal
policies significantly faster than state-of-art SSP solvers.
Related papers
- Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling
Problems Based on Constraint Programming [5.070542698701157]
This paper proposes a novel end-to-end approach to solving scheduling problems by means of CP and Reinforcement Learning (RL)
Our approach leverages existing CP solvers to train an agent learning a Priority Dispatching Rule (PDR) that generalizes well to large instances, even from separate datasets.
arXiv Detail & Related papers (2023-06-09T08:24:56Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Sparse Polynomial Optimization: Theory and Practice [5.27013884159732]
Book presents several efforts to tackle this challenge with important scientific implications.
It provides alternative optimization schemes that scale well in terms of computational complexity.
We present sparsity-exploiting hierarchies of relaxations, for either unconstrained or constrained problems.
arXiv Detail & Related papers (2022-08-23T18:56:05Z) - Exploring Viable Algorithmic Options for Learning from Demonstration
(LfD): A Parameterized Complexity Approach [0.0]
In this paper, we show how such a systematic exploration of algorithmic options can be done using parameterized complexity analysis.
We show that none of our problems can be solved efficiently either in general or relative to a number of (often simultaneous) restrictions on environments, demonstrations, and policies.
arXiv Detail & Related papers (2022-05-10T15:54:06Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - Learning to Schedule Heuristics in Branch-and-Bound [25.79025327341732]
Real-world applications typically require finding good solutions early in the search to enable fast decision-making.
We propose the first data-driven framework for schedulings in an exact MIP solver.
Compared to the default settings of a state-of-the-art academic MIP solver, we are able to reduce the average primal integral by up to 49% on a class of challenging instances.
arXiv Detail & Related papers (2021-03-18T14:49:52Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - Learning General Policies from Small Examples Without Supervision [18.718037284357834]
Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once.
It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans.
In this work, we introduce an alternative approach for computing more expressive general policies which does not require sample plans or a QNP planner.
arXiv Detail & Related papers (2021-01-03T19:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.