Constrained Combinatorial Optimization with Reinforcement Learning
- URL: http://arxiv.org/abs/2006.11984v1
- Date: Mon, 22 Jun 2020 03:13:07 GMT
- Title: Constrained Combinatorial Optimization with Reinforcement Learning
- Authors: Ruben Solozabal and Josu Ceberio and Martin Tak\'a\v{c}
- Abstract summary: This paper presents a framework to tackle constrained optimization problems using deep Reinforcement Learning (RL)
We extend the Neural Combinatorial Optimization (NCO) theory in order to deal with constraints in its formulation.
In that context, the solution is iteratively constructed based on interactions with the environment.
- Score: 0.30938904602244344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a framework to tackle constrained combinatorial
optimization problems using deep Reinforcement Learning (RL). To this end, we
extend the Neural Combinatorial Optimization (NCO) theory in order to deal with
constraints in its formulation.
Notably, we propose defining constrained combinatorial problems as fully
observable Constrained Markov Decision Processes (CMDP). In that context, the
solution is iteratively constructed based on interactions with the environment.
The model, in addition to the reward signal, relies on penalty signals
generated from constraint dissatisfaction to infer a policy that acts as a
heuristic algorithm. Moreover, having access to the complete state
representation during the optimization process allows us to rely on memory-less
architectures, enhancing the results obtained in previous sequence-to-sequence
approaches. Conducted experiments on the constrained Job Shop and Resource
Allocation problems prove the superiority of the proposal for computing rapid
solutions when compared to classical heuristic, metaheuristic, and Constraint
Programming (CP) solvers.
Related papers
- Alternating Minimization Schemes for Computing Rate-Distortion-Perception Functions with $f$-Divergence Perception Constraints [10.564071872770146]
We study the computation of the rate-distortion-perception function (RDPF) for discrete memoryless sources.
We characterize the optimal parametric solutions.
We provide sufficient conditions on the distortion and the perception constraints.
arXiv Detail & Related papers (2024-08-27T12:50:12Z) - Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization [1.1510009152620668]
We present a simple and problem-independent sequence decoding method for self-improved learning.
By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives.
Our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.
arXiv Detail & Related papers (2024-07-24T12:06:09Z) - Analyzing and Enhancing the Backward-Pass Convergence of Unrolled
Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form.
This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method.
A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - Algorithm for Constrained Markov Decision Process with Linear
Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z) - A Globally Convergent Evolutionary Strategy for Stochastic Constrained
Optimization with Applications to Reinforcement Learning [0.6445605125467573]
Evolutionary strategies have been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning.
Convergence guarantees for evolutionary strategies to optimize constrained problems are however lacking in the literature.
arXiv Detail & Related papers (2022-02-21T17:04:51Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Successive Convex Approximation Based Off-Policy Optimization for
Constrained Reinforcement Learning [12.523496806744946]
We propose a convex approximation based off-policy optimization (SCAOPO) algorithm to solve the general constrained reinforcement learning problem.
In spite of the time-varying state distribution and the bias incurred by the off-policy learning, the SCAOPO with a feasible initial point can still provably converge to a Karush-Kuhn-Tucker point.
arXiv Detail & Related papers (2021-05-26T13:52:39Z) - Reversible Action Design for Combinatorial Optimization with
Reinforcement Learning [35.50454156611722]
Reinforcement learning (RL) has recently emerged as a new framework to tackle these problems.
We propose a general RL framework that not only exhibits state-of-the-art empirical performance but also generalizes to a variety class of COPs.
arXiv Detail & Related papers (2021-02-14T18:05:42Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.