Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving
- URL: http://arxiv.org/abs/2509.23470v1
- Date: Sat, 27 Sep 2025 19:47:15 GMT
- Title: Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving
- Authors: Rui Ai, Hugo De Oliveira Barbalho, Sirui Li, Alexei Robsky, David Simchi-Levi, Ishai Menache,
- Abstract summary: A common challenge in real-time operations is deciding whether to re-solve an optimization problem or continue using an existing solution.<n>We propose a framework called Proximal Policy Optimization with Change Point Detection.
- Score: 18.62245790631018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common challenge in real-time operations is deciding whether to re-solve an optimization problem or continue using an existing solution. While modern data platforms may collect information at high frequencies, many real-time operations require repeatedly solving computationally intensive optimization problems formulated as Mixed-Integer Linear Programs (MILPs). Determining when to re-solve is, therefore, an economically important question. This problem poses several challenges: 1) How to characterize solution optimality and solving cost; 2) How to detect environmental changes and select beneficial samples for solving the MILP; 3) Given the large time horizon and non-MDP structure, vanilla reinforcement learning (RL) methods are not directly applicable and tend to suffer from value function explosion. Existing literature largely focuses on heuristics, low-data settings, and smooth objectives, with little focus on common NP-hard MILPs. We propose a framework called Proximal Policy Optimization with Change Point Detection (POC), which systematically offers a solution for balancing performance and cost when deciding appropriate re-solving times. Theoretically, we establish the relationship between the number of re-solves and the re-solving cost. To test our framework, we assemble eight synthetic and real-world datasets, and show that POC consistently outperforms existing baselines by 2%-17%. As a side benefit, our work fills the gap in the literature by introducing real-time MILP benchmarks and evaluation criteria.
Related papers
- Applying a Random-Key Optimizer on Mixed Integer Programs [0.36700088931938835]
Mixed-Integer Programs (MIPs) are NP-hard optimization models that arise in a broad range of decision-making applications.<n>This paper explores the use of the Random-Key integer (RKO) framework as a flexible, metaheuristic alternative for computing high-quality solutions to MIPs.
arXiv Detail & Related papers (2026-02-25T18:20:03Z) - SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search [58.116954449750544]
We introduce a training-free framework that leverages test-time scaling to solve diverse optimization problems.<n>Rather than solving directly, it generates mathematical formulations and translates them into solver-ready code, guided by a novel Monte Carlo Tree Search strategy.
arXiv Detail & Related papers (2025-10-19T16:21:19Z) - Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs [102.48588475875749]
We introduce Generative Self-Refinement (GSR), a novel parallel test-time scaling framework.<n>GSR generates a set of candidate responses in parallel and then performs self-refinement to synthesize a new superior solution.<n>We show that our method achieves state-of-the-art performance across five mathematical benchmarks.
arXiv Detail & Related papers (2025-08-27T06:51:48Z) - A Heuristic Algorithm Based on Beam Search and Iterated Local Search for the Maritime Inventory Routing Problem [0.45152963243489175]
Maritime Inventory Problem (MIRP) plays a crucial role in the integration of global maritime commerce levels.<n>MIRP plays a crucial role in the integration of global maritime commerce levels.<n>There are still no well-established methodologies capable of efficiently solving large MIRP instances or their variants.
arXiv Detail & Related papers (2025-05-17T22:40:36Z) - Autoformulation of Mathematical Optimization Models Using LLMs [50.030647274271516]
This paper approaches the problem of $textitautoformulation$: the automated creation of solver-ready optimization models from natural language problem descriptions.<n>We identify three core challenges of autoformulation: $textit(1)$ the vast, problem-dependent hypothesis space, and $textit(2)$ efficient and diverse exploration of this space under uncertainty.<n>We present a novel method leveraging $textitLarge Language Models$ with $textitMonte-Carlo Tree Search$, exploiting the hierarchical nature of optimization modeling to generate and systematically explore possible formulations
arXiv Detail & Related papers (2024-11-03T20:41:38Z) - Automatic MILP Solver Configuration By Learning Problem Similarities [1.1585113506994469]
Mixed Linear Programs (MILP) solvers expose numerous configuration parameters to control their internal algorithms.
We aim to predict configuration parameters for unseen problem instances that yield lower-cost solutions without the time overhead of searching-and-evaluating configurations.
We show that instances that have similar costs using one solver configuration also have similar costs using another solver configuration in the same runtime environment.
arXiv Detail & Related papers (2023-07-02T21:31:47Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - Learning to Schedule Heuristics in Branch-and-Bound [25.79025327341732]
Real-world applications typically require finding good solutions early in the search to enable fast decision-making.
We propose the first data-driven framework for schedulings in an exact MIP solver.
Compared to the default settings of a state-of-the-art academic MIP solver, we are able to reduce the average primal integral by up to 49% on a class of challenging instances.
arXiv Detail & Related papers (2021-03-18T14:49:52Z) - Contrastive Losses and Solution Caching for Predict-and-Optimize [19.31153168397003]
We use a Noise Contrastive approach to motivate a family of surrogate loss functions.
We address a major bottleneck of all predict-and-optimize approaches.
We show that even a very slow growth rate is enough to match the quality of state-of-the-art methods.
arXiv Detail & Related papers (2020-11-10T19:09:12Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.