Tightening convex relaxations of trained neural networks: a unified approach for convex and S-shaped activations
- URL: http://arxiv.org/abs/2410.23362v1
- Date: Wed, 30 Oct 2024 18:09:53 GMT
- Title: Tightening convex relaxations of trained neural networks: a unified approach for convex and S-shaped activations
- Authors: Pablo Carrasco, Gonzalo Muñoz,
- Abstract summary: We develop a formula that yields a tight convexification for the composition of an activation with an affine function.
Our approach can be used to efficiently compute separating hyperplanes or determine that none exists in various settings.
- Score: 0.0
- License:
- Abstract: The non-convex nature of trained neural networks has created significant obstacles in their incorporation into optimization models. Considering the wide array of applications that this embedding has, the optimization and deep learning communities have dedicated significant efforts to the convexification of trained neural networks. Many approaches to date have considered obtaining convex relaxations for each non-linear activation in isolation, which poses limitations in the tightness of the relaxations. Anderson et al. (2020) strengthened these relaxations and provided a framework to obtain the convex hull of the graph of a piecewise linear convex activation composed with an affine function; this effectively convexifies activations such as the ReLU together with the affine transformation that precedes it. In this article, we contribute to this line of work by developing a recursive formula that yields a tight convexification for the composition of an activation with an affine function for a wide scope of activation functions, namely, convex or ``S-shaped". Our approach can be used to efficiently compute separating hyperplanes or determine that none exists in various settings, including non-polyhedral cases. We provide computational experiments to test the empirical benefits of these convex approximations.
Related papers
- Achieving the Tightest Relaxation of Sigmoids for Formal Verification [9.118502451414082]
In this paper, we derive tuneable hyperplanes which upper and lower the sigoid function.
$alpha$-sig allows us to tractably incorporate the possible, element-wise convex relaxation of the sigoid activation function into a formal verification framework.
arXiv Detail & Related papers (2024-08-20T02:22:27Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Partition-based formulations for mixed-integer optimization of trained
ReLU neural networks [66.88252321870085]
This paper introduces a class of mixed-integer formulations for trained ReLU neural networks.
At one extreme, one partition per input recovers the convex hull of a node, i.e., the tightest possible formulation for each node.
arXiv Detail & Related papers (2021-02-08T17:27:34Z) - Convex Optimization with an Interpolation-based Projection and its
Application to Deep Learning [36.19092177858517]
We investigate whether an inexact, but cheaper projection can drive a descent algorithm to an optimum.
Specifically, we propose an inexact-based projection that is computationally cheap and easy to compute given a convex, domain defining, function.
arXiv Detail & Related papers (2020-11-13T16:52:50Z) - Ideal formulations for constrained convex optimization problems with
indicator variables [2.578242050187029]
We study the convexification of a class of convex optimization problems with indicator variables and constraints on the indicators.
Unlike most of the previous work on convexification of sparse regression problems, we simultaneously consider the nonlinear non-separable objective, indicator variables, and constraints.
We derive ideal convexifications for problems with hierarchy, multi-collinearity, and sparsity constraints.
arXiv Detail & Related papers (2020-06-30T21:07:10Z) - The Convex Relaxation Barrier, Revisited: Tightened Single-Neuron
Relaxations for Neural Network Verification [11.10637926254491]
We improve the effectiveness of propagation- and linear-optimization-based neural network verification algorithms with a new tightened convex relaxation for ReLU neurons.
We design two-time algorithms for neural network verification: a linear-programming-based algorithm that leverages the full power of our relaxation, and a fast propagation algorithm that generalizes existing approaches.
arXiv Detail & Related papers (2020-06-24T22:16:58Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - Optimization of Graph Total Variation via Active-Set-based Combinatorial
Reconditioning [48.42916680063503]
We propose a novel adaptive preconditioning strategy for proximal algorithms on this problem class.
We show that nested-forest decomposition of the inactive edges yields a guaranteed local linear convergence rate.
Our results suggest that local convergence analysis can serve as a guideline for selecting variable metrics in proximal algorithms.
arXiv Detail & Related papers (2020-02-27T16:33:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.