Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis
- URL: http://arxiv.org/abs/2512.08601v1
- Date: Tue, 09 Dec 2025 13:40:08 GMT
- Title: Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis
- Authors: Orit Davidovich, Shimrit Shtern, Segev Wasserkrug, Nimrod Megiddo,
- Abstract summary: We introduce a unified framework to model CO problems through Markov decision processes (MDPs) and solve them using reinforcement learning techniques.<n>We provide easy-to-test assumptions under which CO problems can be formulated as equivalent undiscounted MDPs that provide optimal solutions to the original CO problems.<n>Our convergence analysis provides: (1) a sufficient rate of increase in batch size and projected gradient descent steps at each RL; (2) the resulting optimality gap in terms of problem parameters and targeted RL accuracy; and (3) the importance of a choice of state-space embedding.
- Score: 3.7118625429889742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts to design heuristics per problem type. Due to their structure, many hard CO problems are amenable to treatment through reinforcement learning (RL). Indeed, we find a wealth of literature training NNs using value-based, policy gradient, or actor-critic approaches, with promising results, both in terms of empirical optimality gaps and inference runtimes. Nevertheless, there has been a paucity of theoretical work undergirding the use of RL for CO problems. To this end, we introduce a unified framework to model CO problems through Markov decision processes (MDPs) and solve them using RL techniques. We provide easy-to-test assumptions under which CO problems can be formulated as equivalent undiscounted MDPs that provide optimal solutions to the original CO problems. Moreover, we establish conditions under which value-based RL techniques converge to approximate solutions of the CO problem with a guarantee on the associated optimality gap. Our convergence analysis provides: (1) a sufficient rate of increase in batch size and projected gradient descent steps at each RL iteration; (2) the resulting optimality gap in terms of problem parameters and targeted RL accuracy; and (3) the importance of a choice of state-space embedding. Together, our analysis illuminates the success (and limitations) of the celebrated deep Q-learning algorithm in this problem context.
Related papers
- Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability [129.1296673737603]
Large language models (LLMs) have demonstrated strong reasoning capabilities through step-by-step chain-of-thought (CoT) reasoning.<n>A potential alternative is divide-and-conquer (DAC) reasoning, which decomposes a complex problem into subproblems to facilitate more effective exploration of the solution.<n>We propose an end-to-end reinforcement learning (RL) framework to enhance their DAC-style reasoning capacity.
arXiv Detail & Related papers (2026-02-02T18:54:54Z) - Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - UniCO: Towards a Unified Model for Combinatorial Optimization Problems [30.524941693870534]
Combinatorial Optimization (CO) encompasses a wide range of problems that arise in many real-world scenarios.<n>We introduce UniCO, a unified model for solving various CO problems.
arXiv Detail & Related papers (2025-05-07T12:39:33Z) - Unsupervised Training of Diffusion Models for Feasible Solution Generation in Neural Combinatorial Optimization [7.85458999849461]
We present IC/DC, an unsupervised CO framework that directly trains a diffusion model from scratch.<n>We train our model in a self-supervised way to minimize the cost of the solution while adhering to the problem-specific constraints.<n> IC/DC achieves state-of-the-art performance relative to existing NCO methods on the Parallel Machine Scheduling Problem (PMSP) and Asymmetric Traveling Salesman Problem (ATSP)
arXiv Detail & Related papers (2024-10-15T06:53:30Z) - Decision-focused Graph Neural Networks for Combinatorial Optimization [62.34623670845006]
An emerging strategy to tackle optimization problems involves the adoption of graph neural networks (GNNs) as an alternative to traditional algorithms.
Despite the growing popularity of GNNs and traditional algorithm solvers in the realm of CO, there is limited research on their integrated use and the correlation between them within an end-to-end framework.
We introduce a decision-focused framework that utilizes GNNs to address CO problems with auxiliary support.
arXiv Detail & Related papers (2024-06-05T22:52:27Z) - Leader Reward for POMO-Based Neural Combinatorial Optimization [8.301694061287565]
We propose Leader Reward to enhance the model's ability to generate optimal solutions.
We demonstrate that Leader Reward greatly improves the quality of the optimal solutions generated by the model.
arXiv Detail & Related papers (2024-05-22T19:27:03Z) - Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs.
We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Understanding Curriculum Learning in Policy Optimization for Online
Combinatorial Optimization [66.35750142827898]
This paper presents the first systematic study on policy optimization methods for online CO problems.
We show that online CO problems can be naturally formulated as latent Markov Decision Processes (LMDPs), and prove convergence bounds on natural policy gradient (NPG)
Furthermore, our theory explains the benefit of curriculum learning: it can find a strong sampling policy and reduce the distribution shift.
arXiv Detail & Related papers (2022-02-11T03:17:15Z) - Instance-Dependent Confidence and Early Stopping for Reinforcement
Learning [99.57168572237421]
Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure.
This research provides guarantees that explain textitex post the performance differences observed.
A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice.
arXiv Detail & Related papers (2022-01-21T04:25:35Z) - USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization
Problems [9.015720257837575]
We consider the regression between spaces, aiming to infer high-quality optimization solutions from samples of input-solution pairs.
For learning foundations, we present learning-error analysis under the PAC-Bayesian framework.
We obtain highly encouraging experimental results for several classic problems on both synthetic and real-world datasets.
arXiv Detail & Related papers (2021-07-15T17:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.