An analysis of optimization problems involving ReLU neural networks
- URL: http://arxiv.org/abs/2502.03016v1
- Date: Wed, 05 Feb 2025 09:18:07 GMT
- Title: An analysis of optimization problems involving ReLU neural networks
- Authors: Christoph Plate, Mirko Hahn, Alexander Klimek, Caroline Ganzer, Kai Sundmacher, Sebastian Sager,
- Abstract summary: We study approaches to analyze and improve the run time behavior of mixed-integer programming solvers.
We numerically compare these approaches for three benchmark problems from the literature.
As a major takeaway we observe and quantify a trade-off between the often desired redundancy of neural network models versus the computational costs for solving related optimization problems.
- Score: 38.258426534664046
- License:
- Abstract: Solving mixed-integer optimization problems with embedded neural networks with ReLU activation functions is challenging. Big-M coefficients that arise in relaxing binary decisions related to these functions grow exponentially with the number of layers. We survey and propose different approaches to analyze and improve the run time behavior of mixed-integer programming solvers in this context. Among them are clipped variants and regularization techniques applied during training as well as optimization-based bound tightening and a novel scaling for given ReLU networks. We numerically compare these approaches for three benchmark problems from the literature. We use the number of linear regions, the percentage of stable neurons, and overall computational effort as indicators. As a major takeaway we observe and quantify a trade-off between the often desired redundancy of neural network models versus the computational costs for solving related optimization problems.
Related papers
- Representation and Regression Problems in Neural Networks: Relaxation, Generalization, and Numerics [5.915970073098098]
We address three non-dimensional optimization problems associated with training shallow neural networks (NNs)
We convexify these problems and representation, applying a representer gradient to prove the absence relaxation gaps.
We analyze the impact of key parameters on these bounds, propose optimal choices.
For high-dimensional datasets, we propose a sparsification algorithm that, combined with gradient descent, yields effective solutions.
arXiv Detail & Related papers (2024-12-02T15:40:29Z) - Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - Optimization Over Trained Neural Networks: Taking a Relaxing Walk [4.517039147450688]
We propose a more scalable solver based on exploring global and local linear relaxations of the neural network model.
Our solver is competitive with a state-of-the-art MILP solver and the prior while producing better solutions with increases in input, depth, and number of neurons.
arXiv Detail & Related papers (2024-01-07T11:15:00Z) - Acceleration techniques for optimization over trained neural network
ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation.
We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z) - Neural Network Approximations of Compositional Functions With
Applications to Dynamical Systems [3.660098145214465]
We develop an approximation theory for compositional functions and their neural network approximations.
We identify a set of key features of compositional functions and the relationship between the features and the complexity of neural networks.
In addition to function approximations, we prove several formulae of error upper bounds for neural networks.
arXiv Detail & Related papers (2020-12-03T04:40:25Z) - Large-scale Neural Solvers for Partial Differential Equations [48.7576911714538]
Solving partial differential equations (PDE) is an indispensable part of many branches of science as many processes can be modelled in terms of PDEs.
Recent numerical solvers require manual discretization of the underlying equation as well as sophisticated, tailored code for distributed computing.
We examine the applicability of continuous, mesh-free neural solvers for partial differential equations, physics-informed neural networks (PINNs)
We discuss the accuracy of GatedPINN with respect to analytical solutions -- as well as state-of-the-art numerical solvers, such as spectral solvers.
arXiv Detail & Related papers (2020-09-08T13:26:51Z) - Efficient and Sparse Neural Networks by Pruning Weights in a
Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions.
Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.