Solving Mixed Integer Programs Using Neural Networks
- URL: http://arxiv.org/abs/2012.13349v1
- Date: Wed, 23 Dec 2020 09:33:11 GMT
- Title: Solving Mixed Integer Programs Using Neural Networks
- Authors: Vinod Nair, Sergey Bartunov, Felix Gimeno, Ingrid von Glehn, Pawel
Lichocki, Ivan Lobov, Brendan O'Donoghue, Nicolas Sonnerat, Christian
Tjandraatmadja, Pengming Wang, Ravichandra Addanki, Tharindi Hapuarachchi,
Thomas Keck, James Keeling, Pushmeet Kohli, Ira Ktena, Yujia Li, Oriol
Vinyals, Yori Zwols
- Abstract summary: This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
- Score: 57.683491412480635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixed Integer Programming (MIP) solvers rely on an array of sophisticated
heuristics developed with decades of research to solve large-scale MIP
instances encountered in practice. Machine learning offers to automatically
construct better heuristics from data by exploiting shared structure among
instances in the data. This paper applies learning to the two key sub-tasks of
a MIP solver, generating a high-quality joint variable assignment, and bounding
the gap in objective value between that assignment and an optimal one. Our
approach constructs two corresponding neural network-based components, Neural
Diving and Neural Branching, to use in a base MIP solver such as SCIP. Neural
Diving learns a deep neural network to generate multiple partial assignments
for its integer variables, and the resulting smaller MIPs for un-assigned
variables are solved with SCIP to construct high quality joint assignments.
Neural Branching learns a deep neural network to make variable selection
decisions in branch-and-bound to bound the objective value gap with a small
tree. This is done by imitating a new variant of Full Strong Branching we
propose that scales to large instances using GPUs. We evaluate our approach on
six diverse real-world datasets, including two Google production datasets and
MIPLIB, by training separate neural networks on each. Most instances in all the
datasets combined have $10^3-10^6$ variables and constraints after presolve,
which is significantly larger than previous learning approaches. Comparing
solvers with respect to primal-dual gap averaged over a held-out set of
instances, the learning-augmented SCIP is 2x to 10x better on all datasets
except one on which it is $10^5$x better, at large time limits. To the best of
our knowledge, ours is the first learning approach to demonstrate such large
improvements over SCIP on both large-scale real-world application datasets and
MIPLIB.
Related papers
- MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning [7.262751938473306]
Pruning is a well-established technique that reduces the size of neural networks while mathematically guaranteeing accuracy preservation.
We develop a new pruning algorithm, MPruner, that leverages mutual information through vector similarity.
MPruner achieved up to a 50% reduction in parameters and memory usage for CNN and transformer-based models, with minimal to no loss in accuracy.
arXiv Detail & Related papers (2024-08-24T05:54:47Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - End-to-End Learning of Deep Kernel Acquisition Functions for Bayesian
Optimization [39.56814839510978]
We propose a meta-learning method for Bayesian optimization with neural network-based kernels.
Our model is trained by a reinforcement learning framework from multiple tasks.
In experiments using three text document datasets, we demonstrate that the proposed method achieves better BO performance than the existing methods.
arXiv Detail & Related papers (2021-11-01T00:42:31Z) - Efficient and Robust Mixed-Integer Optimization Methods for Training
Binarized Deep Neural Networks [0.07614628596146598]
We study deep neural networks with binary activation functions and continuous or integer weights (BDNN)
We show that the BDNN can be reformulated as a mixed-integer linear program with bounded weight space which can be solved to global optimality by classical mixed-integer programming solvers.
For the first time a robust model is presented which enforces robustness of the BDNN during training.
arXiv Detail & Related papers (2021-10-21T18:02:58Z) - Fully Dynamic Inference with Deep Neural Networks [19.833242253397206]
Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped.
On the CIFAR-10 dataset, LC-Net results in up to 11.9$times$ fewer floating-point operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic inference methods.
On the ImageNet dataset, LC-Net achieves up to 1.4$times$ fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods.
arXiv Detail & Related papers (2020-07-29T23:17:48Z) - DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures.
We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Identifying Critical Neurons in ANN Architectures using Mixed Integer
Programming [11.712073757744452]
We introduce a mixed integer program (MIP) for assigning importance scores to each neuron in deep neural network architectures.
We drive the solver to minimize the number of critical neurons (i.e., with high importance score) that need to be kept for maintaining the overall accuracy of the trained neural network.
arXiv Detail & Related papers (2020-02-17T21:32:47Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.