Domain-Independent Dynamic Programming: Generic State Space Search for
Combinatorial Optimization
- URL: http://arxiv.org/abs/2211.14409v1
- Date: Sat, 26 Nov 2022 00:15:45 GMT
- Title: Domain-Independent Dynamic Programming: Generic State Space Search for
Combinatorial Optimization
- Authors: Ryo Kuroiwa and J. Christopher Beck
- Abstract summary: We propose domain-independent dynamic programming (DIDP), a new model-based paradigm based on dynamic programming (DP)
We propose Dynamic Programming Description Language (DyPDL), a formalism to define DP models, and develop Cost-Algebraic A* Solver for DyP (CAASDy)
- Score: 13.386517072025722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For combinatorial optimization problems, model-based approaches such as
mixed-integer programming (MIP) and constraint programming (CP) aim to decouple
modeling and solving a problem: the 'holy grail' of declarative problem
solving. We propose domain-independent dynamic programming (DIDP), a new
model-based paradigm based on dynamic programming (DP). While DP is not new, it
has typically been implemented as a problem-specific method. We propose Dynamic
Programming Description Language (DyPDL), a formalism to define DP models, and
develop Cost-Algebraic A* Solver for DyPDL (CAASDy), a generic solver for DyPDL
using state space search. We formalize existing problem-specific DP and state
space search methods for combinatorial optimization problems as DP models in
DyPDL. Using CAASDy and commercial MIP and CP solvers, we experimentally
compare the DP models with existing MIP and CP models, showing that, despite
its nascent nature, CAASDy outperforms MIP and CP on a number of common problem
classes.
Related papers
- Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming [8.495921422521068]
Multi-model Markov decision process (MMDP) is a promising framework for computing policies.
MMDPs aim to find a policy that maximizes the expected return over a distribution of MDP models.
We propose CADP, which combines a coordinate ascent method and a dynamic programming algorithm for solving MMDPs.
arXiv Detail & Related papers (2024-07-08T18:47:59Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Domain-Independent Dynamic Programming [5.449167190254984]
Domain-independent dynamic programming (DIDP) is a new model-based paradigm based on dynamic programming (DP)
We introduce Dynamic Programming Description Language (DyPDL), a formalism to define DP models based on a state transition system, inspired by AI planning.
We show that search algorithms can be used to solve DyPDL models and propose seven DIDP solvers.
arXiv Detail & Related papers (2024-01-25T01:48:09Z) - DIMES: A Differentiable Meta Solver for Combinatorial Optimization
Problems [41.57773395100222]
Deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization problems.
This paper addresses the scalability challenge in large-scale optimization by proposing a novel approach, namely, DIMES.
Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions.
Extensive experiments show that DIMES outperforms recent DRL-based methods on large benchmark datasets for Traveling Salesman Problems and Maximal Independent Set problems.
arXiv Detail & Related papers (2022-10-08T23:24:37Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Linear programming-based solution methods for constrained POMDPs [0.5156484100374059]
Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena.
We use grid-based approximations in combination with linear programming (LP) models to generate approximate policies for CPOMDPs.
arXiv Detail & Related papers (2022-06-28T15:22:24Z) - Neural Stochastic Dual Dynamic Programming [99.80617899593526]
We introduce a trainable neural model that learns to map problem instances to a piece-wise linear value function.
$nu$-SDDP can significantly reduce problem solving cost without sacrificing solution quality.
arXiv Detail & Related papers (2021-12-01T22:55:23Z) - A review of approaches to modeling applied vehicle routing problems [77.34726150561087]
We review the approaches for modeling vehicle routing problems.
We formulate several criteria for evaluating modeling methods.
We discuss several future research avenues in the field of modeling VRP domains.
arXiv Detail & Related papers (2021-05-23T14:50:14Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal
Black Box Constraint Satisfaction [69.94831587339539]
We present a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space.
Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions.
arXiv Detail & Related papers (2020-02-17T20:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.