A Mixed Integer Programming Approach to Training Dense Neural Networks
- URL: http://arxiv.org/abs/2201.00723v1
- Date: Mon, 3 Jan 2022 15:53:51 GMT
- Title: A Mixed Integer Programming Approach to Training Dense Neural Networks
- Authors: Vrishabh Patil and Yonatan Mintz
- Abstract summary: We propose novel mixed integer programming (MIP) formulations for training fully-connected ANNs.
Our formulations can account for both binary activation and rectified linear unit (ReLU) activation ANNs.
We also develop a layer-wise greedy approach, a technique adapted for reducing the number of layers in the ANN, for model pretraining.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Neural Networks (ANNs) are prevalent machine learning models that
have been applied across various real world classification tasks. ANNs require
a large amount of data to have strong out of sample performance, and many
algorithms for training ANN parameters are based on stochastic gradient descent
(SGD). However, the SGD ANNs that tend to perform best on prediction tasks are
trained in an end to end manner that requires a large number of model
parameters and random initialization. This means training ANNs is very time
consuming and the resulting models take a lot of memory to deploy. In order to
train more parsimonious ANN models, we propose the use of alternative methods
from the constrained optimization literature for ANN training and pretraining.
In particular, we propose novel mixed integer programming (MIP) formulations
for training fully-connected ANNs. Our formulations can account for both binary
activation and rectified linear unit (ReLU) activation ANNs, and for the use of
a log likelihood loss. We also develop a layer-wise greedy approach, a
technique adapted for reducing the number of layers in the ANN, for model
pretraining using our MIP formulations. We then present numerical experiments
comparing our MIP based methods against existing SGD based approaches and show
that we are able to achieve models with competitive out of sample performance
that are significantly more parsimonious.
Related papers
- BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [56.9358325168226]
We propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND)
Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model.
Our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model.
arXiv Detail & Related papers (2024-03-23T08:40:38Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - Large Deviations for Accelerating Neural Networks Training [5.864710987890994]
We propose the LAD Improved Iterative Training (LIIT), a novel training approach for ANN using large deviations principle.
The LIIT approach uses a Modified Training Sample (MTS) that is generated and iteratively updated using a LAD anomaly score based sampling strategy.
The MTS sample is designed to be well representative of the training data by including most anomalous of the observations in each class.
arXiv Detail & Related papers (2023-03-02T04:14:05Z) - Adversarial Learning Networks: Source-free Unsupervised Domain
Incremental Learning [0.0]
In a non-stationary environment, updating a DNN model requires parameter re-training or model fine-tuning.
We propose an unsupervised source-free method to update DNN classification models.
Unlike existing methods, our approach can update a DNN model incrementally for non-stationary source and target tasks without storing past training data.
arXiv Detail & Related papers (2023-01-28T02:16:13Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Low-Resource Music Genre Classification with Cross-Modal Neural Model
Reprogramming [129.4950757742912]
We introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR)
NMR aims at re-purposing a pre-trained model from a source domain to a target domain by modifying the input of a frozen pre-trained model.
Experimental results suggest that a neural model pre-trained on large-scale datasets can successfully perform music genre classification by using this reprogramming method.
arXiv Detail & Related papers (2022-11-02T17:38:33Z) - Statistical process monitoring of artificial neural networks [1.3213490507208525]
In machine learning, the learned relationship between the input and the output must remain valid during the model's deployment.
We propose considering the latent feature representation of the data (called "embedding") generated by the ANN to determine the time when the data stream starts being nonstationary.
arXiv Detail & Related papers (2022-09-15T16:33:36Z) - An alternative approach to train neural networks using monotone
variational inequality [22.320632565424745]
We propose an alternative approach to neural network training using the monotone vector field.
Our approach can be used for more efficient fine-tuning of a pre-trained neural network.
arXiv Detail & Related papers (2022-02-17T19:24:20Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.