CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound
- URL: http://arxiv.org/abs/2512.11169v1
- Date: Thu, 11 Dec 2025 23:20:13 GMT
- Title: CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound
- Authors: Akhil S Anand, Elias Aarekol, Martin Mziray Dalseg, Magnus Stalhane, Sebastien Gros,
- Abstract summary: Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs)<n>The inherent difficulty of modeling MILPs that accurately represent real world problems leads to suboptimal performance in the real world.<n>We introduce a proof of concept CORL framework that end to end fine tunes an MILP scheme using reinforcement learning (RL) on real world data to maximize its operational performance.
- Score: 2.070419973405808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs) and solved via branch and bound (B&B) algorithms. The inherent difficulty of modeling MILPs that accurately represent stochastic real world problems leads to suboptimal performance in the real world. Recently, machine learning methods have been applied to build MILP models for decision quality rather than how accurately they model the real world problem. However, these approaches typically rely on supervised learning, assume access to true optimal decisions, and use surrogates for the MILP gradients. In this work, we introduce a proof of concept CORL framework that end to end fine tunes an MILP scheme using reinforcement learning (RL) on real world data to maximize its operational performance. We enable this by casting an MILP solved by B&B as a differentiable stochastic policy compatible with RL. We validate the CORL method in a simple illustrative combinatorial sequential decision making example.
Related papers
- FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming [52.52020895303244]
Mixed-Integer Linear Programming (MILP) is a foundational tool for complex decision-making problems.<n>We propose Joint Continuous-Integer Flow for Mixed-Integer Linear Programming (FMIP), which is the first generative framework that models joint distribution of both integer and continuous variables for MILP solutions.<n>FMIP is fully compatible with arbitrary backbone networks and various downstream solvers, making it well-suited for a broad range of real-world MILP applications.
arXiv Detail & Related papers (2025-07-31T10:03:30Z) - Fast and Interpretable Mixed-Integer Linear Program Solving by Learning Model Reduction [24.3088703166792]
This paper aims to learn a reduced and equivalent model of the original MILP as an intermediate step.<n>The reduced model often corresponds to interpretable operations and is much simpler, enabling us to solve large-scale MILP problems much faster than existing commercial solvers.<n>We introduce an attention mechanism to capture and represent preference information, which helps improve the performance of model reduction learning tasks.
arXiv Detail & Related papers (2024-12-31T06:50:42Z) - Towards An Unsupervised Learning Scheme for Efficiently Solving Parameterized Mixed-Integer Programs [6.1860817947800655]
We train an autoencoder for binary variables in an unsupervised learning fashion.<n>We present a strategy to construct a class of cutting plane constraints from the decoder parameters of an offline-trained AE.<n>Their integration into the primal MIP problem leads to a tightened MIP with the reduced feasible region.
arXiv Detail & Related papers (2024-12-23T14:48:32Z) - Multi-task Representation Learning for Mixed Integer Linear Programming [13.106799330951842]
This paper introduces the first multi-task learning framework for ML-guided MILP solving.<n>We demonstrate that our multi-task learning model performs similarly to specialized models within the same distribution.<n>It significantly outperforms them in generalization across problem sizes and tasks.
arXiv Detail & Related papers (2024-12-18T23:33:32Z) - RL-SPH: Learning to Achieve Feasible Solutions for Integer Linear Programs [3.3894236476098185]
We propose RL-SPH, a novel reinforcement learning-based start primal capable of independently generating feasible solutions even for ILP involving non-binary integers.<n> Experimental results demonstrate that RL-SPH rapidly obtains high-quality feasible solutions, achieving average a 44x lower primal gap and a 2.3x lower integral compared to existing primals.
arXiv Detail & Related papers (2024-11-29T07:23:34Z) - Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs.
We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z) - POMDP inference and robust solution via deep reinforcement learning: An
application to railway optimal maintenance [0.7046417074932257]
We propose a combined framework for inference and robust solution of POMDPs via deep RL.
First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model.
The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization.
arXiv Detail & Related papers (2023-07-16T15:44:58Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - A Bi-Level Framework for Learning to Solve Combinatorial Optimization on
Graphs [91.07247251502564]
We propose a hybrid approach to combine the best of the two worlds, in which a bi-level framework is developed with an upper-level learning method to optimize the graph.
Such a bi-level approach simplifies the learning on the original hard CO and can effectively mitigate the demand for model capacity.
arXiv Detail & Related papers (2021-06-09T09:18:18Z) - SOLO: Search Online, Learn Offline for Combinatorial Optimization
Problems [4.777801093677586]
We study problems with real world applications such as machine scheduling, routing, and assignment.
We propose a method that combines Reinforcement Learning (RL) and planning.
This method can equally be applied to both the offline, as well as online, variants of the problem, in which the problem components are not known in advance, but rather arrive during the decision-making process.
arXiv Detail & Related papers (2021-04-04T17:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.