Related papers: Collab-Solver: Collaborative Solving Policy Learning for Mixed-Integer Linear Programming

Collab-Solver: Collaborative Solving Policy Learning for Mixed-Integer Linear Programming

URL: http://arxiv.org/abs/2508.03030v1
Date: Tue, 05 Aug 2025 03:16:04 GMT
Title: Collab-Solver: Collaborative Solving Policy Learning for Mixed-Integer Linear Programming
Authors: Siyuan Li, Yifan Yu, Yanchen Deng, Zhihao Zhang, Mengjing Chen, Fangzhou Zhu, Tao Zhong, Jianye Hao, Peng Liu, Bo An,
Abstract summary: We propose a novel multi-agent-based policy learning framework for MILP solving as a Stackelberg game.<n>Specifically, we formulate the collaboration of cut selection and branching in MILP solving as a Stackelberg game.<n>The jointly learned policy significantly improves the solving performance on both synthetic and large-scale real-world MILP datasets.
Score: 57.44900640134789
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixed-integer linear programming (MILP) has been a fundamental problem in combinatorial optimization. Previous works have designed a plethora of hard-coded heuristics to accomplish challenging MILP solving with domain knowledge. Driven by the high capability of neural networks, recent research is devoted to replacing manually designed heuristics with learned policies. Although learning-based MILP methods have shown great promise, existing worksindependentlytreatthepolicylearningineachmoduleofMILPsolvers without considering their interdependence, severely hurting the solving speed and quality. To address this issue, we propose a novel multi-agent-based policy learning framework for MILP (Collab-Solver), which can collaboratively optimize the policies for multiple modules. Specifically, we formulate the collaboration of cut selection and branching in MILP solving as a Stackelberg game. Under this formulation, we develop a two-phase learning paradigm to stabilize the collaborative policy learning, where the first phase achieves the data-communicated policy pretraining and the second phase further orchestrates the policy learning for various modules. The jointly learned policy significantly improves the solving performance on both synthetic and large-scale real-world MILP datasets. Moreover, the policies learned by Collab-Solver have also demonstrated excellent generalization abilities across different instance sets.

Related papers

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z)
Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation [62.738582127114704]
Adar is a framework for understanding and improving the generalization of both imitation-learning-based (IL-based) and reinforcement-learning-based solvers (RL-based)
arXiv Detail & Related papers (2023-10-22T03:15:36Z)
Federated Multi-Objective Learning [22.875284692358683]
We propose a new federated multi-objective learning (FMOL) framework with multiple clients. Our FMOL framework allows a different set of objective functions across different clients to support a wide range of applications. For this FMOL framework, we propose two new federated multi-task optimization (FMOO) algorithms called federated multi-gradient descent averaging (FSMGDA) and federated multi-gradient descent averaging (FSMGDA)
arXiv Detail & Related papers (2023-10-15T15:45:51Z)
Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z)
A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem [22.385585755496116]
Existing Multi-Agent Reinforcement Learning (MARL) methods are online and thus impractical for real-world applications in which collecting new interactions is costly or dangerous. We identify and formalize the strategy agreement (SA) and the strategy fine-tuning (SFT) coordination challenges. Our resulting algorithm, Model-based Offline Multi-Agent Proximal Policy Optimization (MOMA-PPO), generates synthetic interaction data and enables agents to converge on a strategy while fine-tuning their policies accordingly.
arXiv Detail & Related papers (2023-05-26T18:43:16Z)
Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences. We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes. We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z)
Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer [7.970144204429356]
We introduce an SF-based extension of the Optimistic Linear Support algorithm to learn a set of policies whose SFs form a convex coverage set. We prove that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks.
arXiv Detail & Related papers (2022-06-22T19:00:08Z)
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms. We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z)
Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space. We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function. We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.