Related papers: Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

URL: http://arxiv.org/abs/2205.14345v1
Date: Sat, 28 May 2022 06:08:07 GMT
Title: Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories
Authors: Christopher W. F. Parsonson, Alexandre Laterre, Thomas D. Barrett
Abstract summary: Machine learning has emerged as a promising paradigm for branching. We propose retro branching; a simple yet effective approach to RL for branching. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
Score: 72.15369769265398
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Combinatorial optimisation problems framed as mixed integer linear programmes (MILPs) are ubiquitous across a range of real-world applications. The canonical branch-and-bound (B&B) algorithm seeks to exactly solve MILPs by constructing a search tree of increasingly constrained sub-problems. In practice, its solving time performance is dependent on heuristics, such as the choice of the next variable to constrain ('branching'). Recently, machine learning (ML) has emerged as a promising paradigm for branching. However, prior works have struggled to apply reinforcement learning (RL), citing sparse rewards, difficult exploration, and partial observability as significant challenges. Instead, leading ML methodologies resort to approximating high quality handcrafted heuristics with imitation learning (IL), which precludes the discovery of novel policies and requires expensive data labelling. In this work, we propose retro branching; a simple yet effective approach to RL for branching. By retrospectively deconstructing the search tree into multiple paths each contained within a sub-tree, we enable the agent to learn from shorter trajectories with more predictable next states. In experiments on four combinatorial tasks, our approach enables learning-to-branch without any expert guidance or pre-training. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables, with ablations verifying that our retrospectively constructed trajectories are essential to achieving these results.

Related papers

Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
How Should We Meta-Learn Reinforcement Learning Algorithms? [74.37180723338591]
We carry out an empirical comparison of the different approaches when applied to a range of meta-learned algorithms.<n>In addition to meta-train and meta-test performance, we also investigate factors including the interpretability, sample cost and train time.<n>We propose several guidelines for meta-learning new RL algorithms which will help ensure that future learned algorithms are as performant as possible.
arXiv Detail & Related papers (2025-07-23T16:31:38Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
Autonomous Tree-search Ability of Large Language Models [58.68735916408101]
Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques. Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks. We propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer.
arXiv Detail & Related papers (2023-10-14T14:14:38Z)
TreeDQN: Learning to minimize Branch-and-Bound tree [78.52895577861327]
Branch-and-Bound is a convenient approach to solving optimization tasks in the form of Mixed Linear Programs. The efficiency of the solver depends on the branchning used to select a variable for splitting. We propose a reinforcement learning method that can efficiently learn the branching.
arXiv Detail & Related papers (2023-06-09T14:01:26Z)
Improving and Benchmarking Offline Reinforcement Learning Algorithms [87.67996706673674]
This work aims to bridge the gaps caused by low-level choices and datasets. We empirically investigate 20 implementation choices using three representative algorithms. We find two variants CRR+ and CQL+ achieving new state-of-the-art on D4RL.
arXiv Detail & Related papers (2023-06-01T17:58:46Z)
Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning [45.1011106869493]
We formulate learning to branch as an offline reinforcement learning (RL) problem. We train a branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust.
arXiv Detail & Related papers (2022-07-26T15:34:10Z)
Deep Reinforcement Learning for Exact Combinatorial Optimization: Learning to Branch [13.024115985194932]
We propose a new approach for solving the data labeling and inference issues in optimization based on the use of the reinforcement learning (RL) paradigm. We use imitation learning to bootstrap an RL agent and then use Proximal Policy (PPO) to further explore global optimal actions.
arXiv Detail & Related papers (2022-06-14T16:35:58Z)
Learning to branch with Tree MDPs [6.754135838894833]
We propose to learn branching rules from scratch via Reinforcement Learning (RL) We propose tree Markov Decision Processes, or tree MDPs, a generalization of temporal MDPs that provides a more suitable framework for learning to branch. We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs.
arXiv Detail & Related papers (2022-05-23T07:57:32Z)
Branching Reinforcement Learning [16.437993672422955]
We propose a novel Branching Reinforcement Learning (Branching RL) model. We investigate Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model. This model finds important applications in hierarchical recommendation systems and online advertising.
arXiv Detail & Related papers (2022-02-16T11:19:03Z)
Reinforcement Learning for Variable Selection in a Branch and Bound Algorithm [0.10499611180329801]
We leverage patterns in real-world instances to learn from scratch a new branching strategy optimised for a given problem. We propose FMSTS, a novel Reinforcement Learning approach specifically designed for this task.
arXiv Detail & Related papers (2020-05-20T13:15:48Z)
Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies [76.83991682238666]
Branch and Bound (B&B) is the exact tree search method typically used to solve Mixed-Integer Linear Programming problems (MILPs) We propose a novel imitation learning framework, and introduce new input features and architectures to represent branching.
arXiv Detail & Related papers (2020-02-12T17:43:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.