Computational Tradeoffs of Optimization-Based Bound Tightening in ReLU
Networks
- URL: http://arxiv.org/abs/2312.16699v2
- Date: Tue, 30 Jan 2024 19:33:29 GMT
- Title: Computational Tradeoffs of Optimization-Based Bound Tightening in ReLU
Networks
- Authors: Fabian Badilla, Marcos Goycoolea, Gonzalo Mu\~noz, Thiago Serra
- Abstract summary: Mixed-Integer Linear Programming (MILP) models to represent neural networks with Rectified Linear Unit (ReLU) activations has become increasingly widespread in the last decade.
This has enabled the use of MILP technology to test-or stress-their behavior, to adversarially improve their training, and to embed them in optimization models leveraging their predictive power.
We provide guidelines for implementing these models based on the impact of network structure, regularization, and rounding.
- Score: 4.01907644010256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of Mixed-Integer Linear Programming (MILP) models to represent neural
networks with Rectified Linear Unit (ReLU) activations has become increasingly
widespread in the last decade. This has enabled the use of MILP technology to
test-or stress-their behavior, to adversarially improve their training, and to
embed them in optimization models leveraging their predictive power. Many of
these MILP models rely on activation bounds. That is, bounds on the input
values of each neuron. In this work, we explore the tradeoff between the
tightness of these bounds and the computational effort of solving the resulting
MILP models. We provide guidelines for implementing these models based on the
impact of network structure, regularization, and rounding.
Related papers
- Diffusion Models as Network Optimizers: Explorations and Analysis [71.69869025878856]
generative diffusion models (GDMs) have emerged as a promising new approach to network optimization.
In this study, we first explore the intrinsic characteristics of generative models.
We provide a concise theoretical and intuitive demonstration of the advantages of generative models over discriminative network optimization.
arXiv Detail & Related papers (2024-11-01T09:05:47Z) - Bayesian Entropy Neural Networks for Physics-Aware Prediction [14.705526856205454]
We introduce BENN, a framework designed to impose constraints on Bayesian Neural Network (BNN) predictions.
Benn is capable of constraining not only the predicted values but also their derivatives and variances, ensuring a more robust and reliable model output.
Results highlight significant improvements over traditional BNNs and showcase competitive performance relative to contemporary constrained deep learning methods.
arXiv Detail & Related papers (2024-07-01T07:00:44Z) - Deep learning enhanced mixed integer optimization: Learning to reduce model dimensionality [0.0]
This work introduces a framework to address the computational complexity inherent in Mixed-Integer Programming.
By employing deep learning, we construct problem-specific models that identify and exploit common structures across MIP instances.
We present an algorithm for generating synthetic data enhancing the robustness and generalizability of our models.
arXiv Detail & Related papers (2024-01-17T19:15:13Z) - Optimization Over Trained Neural Networks: Taking a Relaxing Walk [4.517039147450688]
We propose a more scalable solver based on exploring global and local linear relaxations of the neural network model.
Our solver is competitive with a state-of-the-art MILP solver and the prior while producing better solutions with increases in input, depth, and number of neurons.
arXiv Detail & Related papers (2024-01-07T11:15:00Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Mixed-Integer Optimisation of Graph Neural Networks for Computer-Aided
Molecular Design [4.593587844188084]
ReLU neural networks have been modelled as constraints in mixed integer linear programming (MILP)
We propose a formulation for ReLU Graph Convolutional Neural Networks and a MILP formulation for ReLU GraphSAGE models.
These formulations enable solving optimisation problems with trained GNNs embedded to global optimality.
arXiv Detail & Related papers (2023-12-02T21:10:18Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Linearization of ReLU Activation Function for Neural Network-Embedded
Optimization:Optimal Day-Ahead Energy Scheduling [0.2900810893770134]
In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models.
The use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable.
This paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function.
arXiv Detail & Related papers (2023-10-03T02:47:38Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.