Related papers: Constrained Combinatorial Optimization with Reinforcement Learning

Constrained Combinatorial Optimization with Reinforcement Learning

URL: http://arxiv.org/abs/2006.11984v1
Date: Mon, 22 Jun 2020 03:13:07 GMT
Title: Constrained Combinatorial Optimization with Reinforcement Learning
Authors: Ruben Solozabal and Josu Ceberio and Martin Tak\'a\v{c}
Abstract summary: This paper presents a framework to tackle constrained optimization problems using deep Reinforcement Learning (RL) We extend the Neural Combinatorial Optimization (NCO) theory in order to deal with constraints in its formulation. In that context, the solution is iteratively constructed based on interactions with the environment.
Score: 0.30938904602244344
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a framework to tackle constrained combinatorial optimization problems using deep Reinforcement Learning (RL). To this end, we extend the Neural Combinatorial Optimization (NCO) theory in order to deal with constraints in its formulation. Notably, we propose defining constrained combinatorial problems as fully observable Constrained Markov Decision Processes (CMDP). In that context, the solution is iteratively constructed based on interactions with the environment. The model, in addition to the reward signal, relies on penalty signals generated from constraint dissatisfaction to infer a policy that acts as a heuristic algorithm. Moreover, having access to the complete state representation during the optimization process allows us to rely on memory-less architectures, enhancing the results obtained in previous sequence-to-sequence approaches. Conducted experiments on the constrained Job Shop and Resource Allocation problems prove the superiority of the proposal for computing rapid solutions when compared to classical heuristic, metaheuristic, and Constraint Programming (CP) solvers.

Related papers

Efficient QAOA Architecture for Solving Multi-Constrained Optimization Problems [3.757262277494307]
This paper proposes a novel combination of constraint encoding methods for the Quantum Approximate Optimization Ansatz.<n>One-hot constraints are enforced through $XY$-mixers that restrict the search space to the feasible sub-space naturally.<n>Since $XY$-mixers restrict the search space, specific state vector entries are always zero and can be omitted from the simulation, saving valuable memory and computing resources.
arXiv Detail & Related papers (2025-06-03T17:46:53Z)
Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs. We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z)
Alternating Minimization Schemes for Computing Rate-Distortion-Perception Functions with $f$-Divergence Perception Constraints [10.564071872770146]
We study the computation of the rate-distortion-perception function (RDPF) for discrete memoryless sources. We characterize the optimal parametric solutions. We provide sufficient conditions on the distortion and the perception constraints.
arXiv Detail & Related papers (2024-08-27T12:50:12Z)
Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization [1.1510009152620668]
We present a simple and problem-independent sequence decoding method for self-improved learning. By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives. Our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.
arXiv Detail & Related papers (2024-07-24T12:06:09Z)
Analyzing and Enhancing the Backward-Pass Convergence of Unrolled Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form. This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method. A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning. Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable. Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z)
Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver. This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z)
Algorithm for Constrained Markov Decision Process with Linear Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs. A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual. The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z)
A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning [0.6445605125467573]
Evolutionary strategies have been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning. Convergence guarantees for evolutionary strategies to optimize constrained problems are however lacking in the literature.
arXiv Detail & Related papers (2022-02-21T17:04:51Z)
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints. A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual. This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z)
Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning [12.523496806744946]
We propose a convex approximation based off-policy optimization (SCAOPO) algorithm to solve the general constrained reinforcement learning problem. In spite of the time-varying state distribution and the bias incurred by the off-policy learning, the SCAOPO with a feasible initial point can still provably converge to a Karush-Kuhn-Tucker point.
arXiv Detail & Related papers (2021-05-26T13:52:39Z)
Reversible Action Design for Combinatorial Optimization with Reinforcement Learning [35.50454156611722]
Reinforcement learning (RL) has recently emerged as a new framework to tackle these problems. We propose a general RL framework that not only exhibits state-of-the-art empirical performance but also generalizes to a variety class of COPs.
arXiv Detail & Related papers (2021-02-14T18:05:42Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.