Domain-Independent Dynamic Programming: Generic State Space Search for
Combinatorial Optimization
- URL: http://arxiv.org/abs/2211.14409v1
- Date: Sat, 26 Nov 2022 00:15:45 GMT
- Title: Domain-Independent Dynamic Programming: Generic State Space Search for
Combinatorial Optimization
- Authors: Ryo Kuroiwa and J. Christopher Beck
- Abstract summary: We propose domain-independent dynamic programming (DIDP), a new model-based paradigm based on dynamic programming (DP)
We propose Dynamic Programming Description Language (DyPDL), a formalism to define DP models, and develop Cost-Algebraic A* Solver for DyP (CAASDy)
- Score: 13.386517072025722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For combinatorial optimization problems, model-based approaches such as
mixed-integer programming (MIP) and constraint programming (CP) aim to decouple
modeling and solving a problem: the 'holy grail' of declarative problem
solving. We propose domain-independent dynamic programming (DIDP), a new
model-based paradigm based on dynamic programming (DP). While DP is not new, it
has typically been implemented as a problem-specific method. We propose Dynamic
Programming Description Language (DyPDL), a formalism to define DP models, and
develop Cost-Algebraic A* Solver for DyPDL (CAASDy), a generic solver for DyPDL
using state space search. We formalize existing problem-specific DP and state
space search methods for combinatorial optimization problems as DP models in
DyPDL. Using CAASDy and commercial MIP and CP solvers, we experimentally
compare the DP models with existing MIP and CP models, showing that, despite
its nascent nature, CAASDy outperforms MIP and CP on a number of common problem
classes.
Related papers
- Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes.
This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z) - Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming [8.495921422521068]
Multi-model Markov decision process (MMDP) is a promising framework for computing policies.
MMDPs aim to find a policy that maximizes the expected return over a distribution of MDP models.
We propose CADP, which combines a coordinate ascent method and a dynamic programming algorithm for solving MMDPs.
arXiv Detail & Related papers (2024-07-08T18:47:59Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Domain-Independent Dynamic Programming [5.449167190254984]
Domain-independent dynamic programming (DIDP) is a new model-based paradigm based on dynamic programming (DP)
We introduce Dynamic Programming Description Language (DyPDL), a formalism to define DP models based on a state transition system, inspired by AI planning.
We show that search algorithms can be used to solve DyPDL models and propose seven DIDP solvers.
arXiv Detail & Related papers (2024-01-25T01:48:09Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Linear programming-based solution methods for constrained POMDPs [0.5156484100374059]
Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena.
We use grid-based approximations in combination with linear programming (LP) models to generate approximate policies for CPOMDPs.
arXiv Detail & Related papers (2022-06-28T15:22:24Z) - Neural Stochastic Dual Dynamic Programming [99.80617899593526]
We introduce a trainable neural model that learns to map problem instances to a piece-wise linear value function.
$nu$-SDDP can significantly reduce problem solving cost without sacrificing solution quality.
arXiv Detail & Related papers (2021-12-01T22:55:23Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal
Black Box Constraint Satisfaction [69.94831587339539]
We present a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space.
Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions.
arXiv Detail & Related papers (2020-02-17T20:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.