Related papers: A Study of Learning Search Approximation in Mixed Integer Branch and Bound: Node Selection in SCIP

A Study of Learning Search Approximation in Mixed Integer Branch and Bound: Node Selection in SCIP

URL: http://arxiv.org/abs/2007.03948v2
Date: Mon, 3 Jan 2022 21:04:30 GMT
Title: A Study of Learning Search Approximation in Mixed Integer Branch and Bound: Node Selection in SCIP
Authors: Kaan Yilmaz and Neil Yorke-Smith
Abstract summary: We present an offline method to learn such a policy in two settings. One setting corresponds to a child selector during plunging, while the latter is akin to a diving. Empirical results on five MIP datasets indicate that our node selection policy leads to solutions significantly more quickly than the state-of-the-art precedent in the literature.
Score: 2.528056693920671
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In line with the growing trend of using machine learning to help solve combinatorial optimisation problems, one promising idea is to improve node selection within a mixed integer programming (MIP) branch-and-bound tree by using a learned policy. Previous work using imitation learning indicates the feasibility of acquiring a node selection policy, by learning an adaptive node searching order. In contrast, our imitation learning policy is focused solely on learning which of a node's children to select. We present an offline method to learn such a policy in two settings: one that comprises a heuristic by committing to pruning of nodes; one that is exact and backtracks from a leaf to guarantee finding the optimal integer solution. The former setting corresponds to a child selector during plunging, while the latter is akin to a diving heuristic. We apply the policy within the popular open-source solver SCIP, in both heuristic and exact settings. Empirical results on five MIP datasets indicate that our node selection policy leads to solutions significantly more quickly than the state-of-the-art precedent in the literature. While we do not beat the highly-optimised SCIP state-of-practice baseline node selector in terms of solving time on exact solutions, our heuristic policies have a consistently better optimality gap than all baselines, if the accuracy of the predictive model is sufficient. Further, the results also indicate that, when a time limit is applied, our heuristic method finds better solutions than all baselines in the majority of problems tested. We explain the results by showing that the learned policies have imitated the SCIP baseline, but without the latter's early plunge abort. Our recommendation is that, despite the clear improvements over the literature, this kind of MIP child selector is better seen in a broader approach using learning in MIP branch-and-bound tree decisions.

Related papers

Online Decision-Focused Learning [63.83903681295497]
Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks.<n>We investigate DFL in dynamic environments where the objective function does not evolve over time.<n>We establish bounds on the expected dynamic regret, both when decision space is a simplex and when it is a general bounded convex polytope.
arXiv Detail & Related papers (2025-05-19T10:40:30Z)
Search Strategy Generation for Branch and Bound Using Genetic Programming [0.0]
We introduce GP2S (Genetic Programming for Search Strategy), a novel machine learning approach that automatically generates a B&B search strategy. We compare our approach with the standard method of the SCIP solver, a recent graph neural network-based method, and handcrafteds. Our method is at most 2% slower than the best baseline and consistently outperforms SCIP, achieving an average speedup of 11.3%.
arXiv Detail & Related papers (2024-12-12T16:57:46Z)
Reinforcement Learning for Node Selection in Branch-and-Bound [52.2648997215667]
Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data. We propose a novel simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes.
arXiv Detail & Related papers (2023-09-29T19:55:56Z)
Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization [101.08992036691673]
This paper explores a realistic unsupervised fine-tuning scenario, considering the presence of out-of-distribution samples from unknown classes. In particular, we focus on simultaneously enhancing out-of-distribution detection and the recognition of instances associated with known classes. We present a simple, efficient, and effective approach called Universal Entropy Optimization (UEO)
arXiv Detail & Related papers (2023-08-24T16:47:17Z)
Optimal Decision Tree Policies for Markov Decision Processes [7.995360025953931]
We study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. While there is generally a trade-off between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit.
arXiv Detail & Related papers (2023-01-30T18:51:02Z)
Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences. We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes. We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z)
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z)
Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning [45.1011106869493]
We formulate learning to branch as an offline reinforcement learning (RL) problem. We train a branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust.
arXiv Detail & Related papers (2022-07-26T15:34:10Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs [6.084888301899142]
We consider a learning-based LNS approach for mixed integer programs (MIPs) We train a Neural Diving model to represent a probability distribution over assignments, which, together with an off-the-shelf MIP solver, generates an initial assignment. We train a Neural Neighborhood Selection policy to select a search neighborhood at each step, which is searched using a MIP solver to find the next assignment.
arXiv Detail & Related papers (2021-07-21T16:43:46Z)
Exploiting Learned Policies in Focal Search [0.49723239539321284]
We show how to integrate policy learning into a bounded-suboptimal search algorithm. We evaluate our focal search variants over three benchmark domains using our synthetic approach, and on the 15-puzzle using a neural network learned using 1.5 million examples. We observe that emphDiscrepancy Focal Search, which we show expands the node which maximizes an approximation of the probability that its corresponding path is a prefix of an optimal path, obtains, in general, the best results in terms of runtime and solution quality.
arXiv Detail & Related papers (2021-04-21T13:50:40Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.