BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial
Optimization
- URL: http://arxiv.org/abs/2301.03313v3
- Date: Thu, 28 Sep 2023 21:06:39 GMT
- Title: BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial
Optimization
- Authors: Darko Drakulic, Sofia Michel, Florian Mai, Arnaud Sors and Jean-Marc
Andreoli
- Abstract summary: We present a novel formulation of Combinatorial Optimization Problems (COPs) as Markov Decision Processes (MDPs)
We show how the reduced state exploits the symmetries of these problems and facilitates MDP solving.
We illustrate our approach on five classical problems: the Euclidean and Asymmetric Traveling Salesman, Capacitated Vehicle Routing, Orienteering and Knapsack Problems.
- Score: 1.5806557511612143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the success of neural-based combinatorial optimization methods for
end-to-end heuristic learning, out-of-distribution generalization remains a
challenge. In this paper, we present a novel formulation of Combinatorial
Optimization Problems (COPs) as Markov Decision Processes (MDPs) that
effectively leverages common symmetries of COPs to improve out-of-distribution
robustness. Starting from a direct MDP formulation of a constructive method, we
introduce a generic way to reduce the state space, based on Bisimulation
Quotienting (BQ) in MDPs. Then, for COPs with a recursive nature, we specialize
the bisimulation and show how the reduced state exploits the symmetries of
these problems and facilitates MDP solving. Our approach is principled and we
prove that an optimal policy for the proposed BQ-MDP actually solves the
associated COPs. We illustrate our approach on five classical problems: the
Euclidean and Asymmetric Traveling Salesman, Capacitated Vehicle Routing,
Orienteering and Knapsack Problems. Furthermore, for each problem, we introduce
a simple attention-based policy network for the BQ-MDPs, which we train by
imitation of (near) optimal solutions of small instances from a single
distribution. We obtain new state-of-the-art results for the five COPs on both
synthetic and realistic benchmarks. Notably, in contrast to most existing
neural approaches, our learned policies show excellent generalization
performance to much larger instances than seen during training, without any
additional search procedure.
Related papers
- Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Semi-Infinitely Constrained Markov Decision Processes and Efficient
Reinforcement Learning [17.04643707688075]
We consider a continuum of constraints instead of a finite number of constraints as in the case of ordinary CMDPs.
We devise two reinforcement learning algorithms for SICMDPs that we call SI-CRL and SI-CPO.
To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems.
arXiv Detail & Related papers (2023-04-29T12:52:38Z) - Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs [24.256960622176305]
We propose the first (nearly) matching upper and lower bounds on the sample complexity of PAC RL in episodic Markov decision processes.
Our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap.
Their design and analyses employ novel ideas, including graph-theoretical concepts such as minimum flows and maximum cuts.
arXiv Detail & Related papers (2022-03-17T11:19:41Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Pretrained Cost Model for Distributed Constraint Optimization Problems [37.79733538931925]
Distributed Constraint Optimization Problems (DCOPs) are an important subclass of optimization problems.
We propose a novel directed acyclic graph schema representation for DCOPs and leverage the Graph Attention Networks (GATs) to embed graph representations.
Our model, GAT-PCM, is then pretrained with optimally labelled data in an offline manner, so as to boost a broad range of DCOP algorithms.
arXiv Detail & Related papers (2021-12-08T09:24:10Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Constrained Combinatorial Optimization with Reinforcement Learning [0.30938904602244344]
This paper presents a framework to tackle constrained optimization problems using deep Reinforcement Learning (RL)
We extend the Neural Combinatorial Optimization (NCO) theory in order to deal with constraints in its formulation.
In that context, the solution is iteratively constructed based on interactions with the environment.
arXiv Detail & Related papers (2020-06-22T03:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.