A Study of Learning Search Approximation in Mixed Integer Branch and
Bound: Node Selection in SCIP
- URL: http://arxiv.org/abs/2007.03948v2
- Date: Mon, 3 Jan 2022 21:04:30 GMT
- Title: A Study of Learning Search Approximation in Mixed Integer Branch and
Bound: Node Selection in SCIP
- Authors: Kaan Yilmaz and Neil Yorke-Smith
- Abstract summary: We present an offline method to learn such a policy in two settings.
One setting corresponds to a child selector during plunging, while the latter is akin to a diving.
Empirical results on five MIP datasets indicate that our node selection policy leads to solutions significantly more quickly than the state-of-the-art precedent in the literature.
- Score: 2.528056693920671
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In line with the growing trend of using machine learning to help solve
combinatorial optimisation problems, one promising idea is to improve node
selection within a mixed integer programming (MIP) branch-and-bound tree by
using a learned policy. Previous work using imitation learning indicates the
feasibility of acquiring a node selection policy, by learning an adaptive node
searching order. In contrast, our imitation learning policy is focused solely
on learning which of a node's children to select. We present an offline method
to learn such a policy in two settings: one that comprises a heuristic by
committing to pruning of nodes; one that is exact and backtracks from a leaf to
guarantee finding the optimal integer solution. The former setting corresponds
to a child selector during plunging, while the latter is akin to a diving
heuristic. We apply the policy within the popular open-source solver SCIP, in
both heuristic and exact settings. Empirical results on five MIP datasets
indicate that our node selection policy leads to solutions significantly more
quickly than the state-of-the-art precedent in the literature. While we do not
beat the highly-optimised SCIP state-of-practice baseline node selector in
terms of solving time on exact solutions, our heuristic policies have a
consistently better optimality gap than all baselines, if the accuracy of the
predictive model is sufficient. Further, the results also indicate that, when a
time limit is applied, our heuristic method finds better solutions than all
baselines in the majority of problems tested. We explain the results by showing
that the learned policies have imitated the SCIP baseline, but without the
latter's early plunge abort. Our recommendation is that, despite the clear
improvements over the literature, this kind of MIP child selector is better
seen in a broader approach using learning in MIP branch-and-bound tree
decisions.
Related papers
- Reinforcement Learning for Node Selection in Branch-and-Bound [52.2648997215667]
Current state-of-the-art selectors utilize either hand-crafted ensembles that automatically switch between naive sub-node selectors, or learned node selectors that rely on individual node data.
We propose a novel simulation technique that uses reinforcement learning (RL) while considering the entire tree state, rather than just isolated nodes.
arXiv Detail & Related papers (2023-09-29T19:55:56Z) - Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization [101.08992036691673]
This paper explores a realistic unsupervised fine-tuning scenario, considering the presence of out-of-distribution samples from unknown classes.
In particular, we focus on simultaneously enhancing out-of-distribution detection and the recognition of instances associated with known classes.
We present a simple, efficient, and effective approach called Universal Entropy Optimization (UEO)
arXiv Detail & Related papers (2023-08-24T16:47:17Z) - Optimal Decision Tree Policies for Markov Decision Processes [7.995360025953931]
We study the optimization of size-limited decision trees for Markov Decision Processes (MPDs)
We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees.
While there is generally a trade-off between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit.
arXiv Detail & Related papers (2023-01-30T18:51:02Z) - Sample-Efficient Multi-Objective Learning via Generalized Policy
Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences.
We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes.
We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.
Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.
We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Branch Ranking for Efficient Mixed-Integer Programming via Offline
Ranking-based Policy Learning [45.1011106869493]
We formulate learning to branch as an offline reinforcement learning (RL) problem.
We train a branching model named Branch Ranking via offline policy learning.
Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust.
arXiv Detail & Related papers (2022-07-26T15:34:10Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Learning a Large Neighborhood Search Algorithm for Mixed Integer
Programs [6.084888301899142]
We consider a learning-based LNS approach for mixed integer programs (MIPs)
We train a Neural Diving model to represent a probability distribution over assignments, which, together with an off-the-shelf MIP solver, generates an initial assignment.
We train a Neural Neighborhood Selection policy to select a search neighborhood at each step, which is searched using a MIP solver to find the next assignment.
arXiv Detail & Related papers (2021-07-21T16:43:46Z) - Exploiting Learned Policies in Focal Search [0.49723239539321284]
We show how to integrate policy learning into a bounded-suboptimal search algorithm.
We evaluate our focal search variants over three benchmark domains using our synthetic approach, and on the 15-puzzle using a neural network learned using 1.5 million examples.
We observe that emphDiscrepancy Focal Search, which we show expands the node which maximizes an approximation of the probability that its corresponding path is a prefix of an optimal path, obtains, in general, the best results in terms of runtime and solution quality.
arXiv Detail & Related papers (2021-04-21T13:50:40Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.