Related papers: A Markov Decision Process for Variable Selection in Branch & Bound

A Markov Decision Process for Variable Selection in Branch & Bound

URL: http://arxiv.org/abs/2510.19348v1
Date: Wed, 22 Oct 2025 08:15:32 GMT
Title: A Markov Decision Process for Variable Selection in Branch & Bound
Authors: Paul Strang, Zacharie Alès, Côme Bissuel, Olivier Juan, Safia Kedad-Sidhoum, Emmanuel Rachelson,
Abstract summary: A key factor influencing the performance of Branch and Bound solvers is the variable selection governing branching decisions.<n>Recent contributions have sought to adapt reinforcement learning algorithms to the B&B setting to learn optimal branching policies.<n>We introduce BBMDP, a principled vanilla MDP formulation for variable selection in B&B, allowing to leverage a broad range of RL algorithms.
Score: 4.802758600019421
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mixed-Integer Linear Programming (MILP) is a powerful framework used to address a wide range of NP-hard combinatorial optimization problems, often solved by Branch and Bound (B&B). A key factor influencing the performance of B&B solvers is the variable selection heuristic governing branching decisions. Recent contributions have sought to adapt reinforcement learning (RL) algorithms to the B&B setting to learn optimal branching policies, through Markov Decision Processes (MDP) inspired formulations, and ad hoc convergence theorems and algorithms. In this work, we introduce BBMDP, a principled vanilla MDP formulation for variable selection in B&B, allowing to leverage a broad range of RL algorithms for the purpose of learning optimal B\&B heuristics. Computational experiments validate our model empirically, as our branching agent outperforms prior state-of-the-art RL agents on four standard MILP benchmarks.

Related papers

Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization [5.089078998562186]
Mixed-Integer Linear Programming (MILP) lies at the core of many real-world optimization (CO) problems, traditionally solved by branch-and-bound (B&B)<n>We introduce Plan-and-Branch-and-Bound (PlanB&B), a model-based reinforcement learning (MBRL) agent that leverages a learned internal model of the B&B dynamics to discover improved branching strategies.
arXiv Detail & Related papers (2025-11-12T11:28:08Z)
Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z)
Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes [7.028778922533688]
Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty.<n>We introduce Reward-Extended Differential (or RED) reinforcement learning: a novel RL framework that can be used to effectively and efficiently solve various learning objectives, or subtasks, simultaneously in the average-reward setting.
arXiv Detail & Related papers (2024-10-14T14:52:23Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
Contextual Bilevel Reinforcement Learning for Incentive Alignment [42.22085862132403]
We introduce Contextual Bilevel Reinforcement Learning (CB-RL), a bilevel decision-making model.<n>CB-RL can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of many MDPs.<n>This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as reward shaping, contract theory and mechanism design.
arXiv Detail & Related papers (2024-06-03T17:54:39Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning. We propose a novel estimation function with decomposable structural properties for optimization-based exploration. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z)
Model-free Reinforcement Learning for Branching Markov Decision Processes [6.402126624793774]
We study reinforcement learning for the optimal control of Branching Markov Decision Processes. The state of a (discrete-time) BMCs is a collection of entities that, while spawning other entities, generate a payoff. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit.
arXiv Detail & Related papers (2021-06-12T13:42:15Z)
Permutation Invariant Policy Optimization for Mean-Field Multi-Agent Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture. We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence. In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z)
Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies [76.83991682238666]
Branch and Bound (B&B) is the exact tree search method typically used to solve Mixed-Integer Linear Programming problems (MILPs) We propose a novel imitation learning framework, and introduce new input features and architectures to represent branching.
arXiv Detail & Related papers (2020-02-12T17:43:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.