Optimal training of integer-valued neural networks with mixed integer
programming
- URL: http://arxiv.org/abs/2009.03825v5
- Date: Fri, 31 Mar 2023 17:15:54 GMT
- Title: Optimal training of integer-valued neural networks with mixed integer
programming
- Authors: T\'omas Thorbjarnarson and Neil Yorke-Smith
- Abstract summary: We develop new MIP models which improve training efficiency and which can train the important class of integer-valued neural networks (INNs)
We provide a batch training method that dramatically increases the amount of data that MIP solvers can use to train.
Experimental results on two real-world data-limited datasets demonstrate that our approach strongly outperforms the previous state of the art in training NNs with MIP.
- Score: 2.528056693920671
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown potential in using Mixed Integer Programming (MIP)
solvers to optimize certain aspects of neural networks (NNs). However the
intriguing approach of training NNs with MIP solvers is under-explored.
State-of-the-art-methods to train NNs are typically gradient-based and require
significant data, computation on GPUs, and extensive hyper-parameter tuning. In
contrast, training with MIP solvers does not require GPUs or heavy
hyper-parameter tuning, but currently cannot handle anything but small amounts
of data. This article builds on recent advances that train binarized NNs using
MIP solvers. We go beyond current work by formulating new MIP models which
improve training efficiency and which can train the important class of
integer-valued neural networks (INNs). We provide two novel methods to further
the potential significance of using MIP to train NNs. The first method
optimizes the number of neurons in the NN while training. This reduces the need
for deciding on network architecture before training. The second method
addresses the amount of training data which MIP can feasibly handle: we provide
a batch training method that dramatically increases the amount of data that MIP
solvers can use to train. We thus provide a promising step towards using much
more data than before when training NNs using MIP models. Experimental results
on two real-world data-limited datasets demonstrate that our approach strongly
outperforms the previous state of the art in training NN with MIP, in terms of
accuracy, training time and amount of data. Our methodology is proficient at
training NNs when minimal training data is available, and at training with
minimal memory requirements -- which is potentially valuable for deploying to
low-memory devices.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Minibatch training of neural network ensembles via trajectory sampling [0.0]
We show that a minibatch approach can also be used to train neural network ensembles (NNEs) via trajectory methods in a highly efficient manner.
We illustrate this approach by training NNEs to classify images in the MNIST datasets.
arXiv Detail & Related papers (2023-06-23T11:12:33Z) - Transferring Learning Trajectories of Neural Networks [2.2299983745857896]
Training deep neural networks (DNNs) is computationally expensive.
We formulate the problem of "transferring" a given learning trajectory from one initial parameter to another one.
We empirically show that the transferred parameters achieve non-trivial accuracy before any direct training, and can be trained significantly faster than training from scratch.
arXiv Detail & Related papers (2023-05-23T14:46:32Z) - Multi-Objective Linear Ensembles for Robust and Sparse Training of Few-Bit Neural Networks [5.246498560938275]
We study the case of few-bit discrete-valued neural networks, both Binarized Neural Networks (BNNs) and Neural Networks (INNs)
Our contribution is a multi-objective ensemble approach based on training a single NN for each possible pair of classes and applying a majority voting scheme to predict the final output.
We compare this BeMi approach to the current state-of-the-art in solver-based NN training and gradient-based training, focusing on BNN learning in few-shot contexts.
arXiv Detail & Related papers (2022-12-07T14:23:43Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Low-Precision Training in Logarithmic Number System using Multiplicative
Weight Update [49.948082497688404]
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts.
One promising approach to reduce the energy costs is representing DNNs with low-precision numbers.
We jointly design a lowprecision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam.
arXiv Detail & Related papers (2021-06-26T00:32:17Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - A Meta-Learning Approach to the Optimal Power Flow Problem Under
Topology Reconfigurations [69.73803123972297]
We propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach.
The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems.
arXiv Detail & Related papers (2020-12-21T17:39:51Z) - NN-EMD: Efficiently Training Neural Networks using Encrypted
Multi-Sourced Datasets [7.067870969078555]
Training a machine learning model over an encrypted dataset is an existing promising approach to address the privacy-preserving machine learning task.
We propose a novel framework, NN-EMD, to train a deep neural network (DNN) model over multiple datasets collected from multiple sources.
We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets.
arXiv Detail & Related papers (2020-12-18T23:01:20Z) - The training accuracy of two-layer neural networks: its estimation and
understanding using random datasets [0.0]
We propose a novel theory based on space partitioning to estimate the approximate training accuracy for two-layer neural networks on random datasets without training.
Our method estimates the training accuracy for two-layer fully-connected neural networks on two-class random datasets using only three arguments.
arXiv Detail & Related papers (2020-10-26T07:21:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.