Achieving the Tightest Relaxation of Sigmoids for Formal Verification
- URL: http://arxiv.org/abs/2408.10491v2
- Date: Thu, 22 Aug 2024 00:10:24 GMT
- Title: Achieving the Tightest Relaxation of Sigmoids for Formal Verification
- Authors: Samuel Chevalier, Duncan Starkenburg, Krishnamurthy Dvijotham,
- Abstract summary: In this paper, we derive tuneable hyperplanes which upper and lower the sigoid function.
$alpha$-sig allows us to tractably incorporate the possible, element-wise convex relaxation of the sigoid activation function into a formal verification framework.
- Score: 9.118502451414082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of formal verification, Neural Networks (NNs) are typically reformulated into equivalent mathematical programs which are optimized over. To overcome the inherent non-convexity of these reformulations, convex relaxations of nonlinear activation functions are typically utilized. Common relaxations (i.e., static linear cuts) of "S-shaped" activation functions, however, can be overly loose, slowing down the overall verification process. In this paper, we derive tuneable hyperplanes which upper and lower bound the sigmoid activation function. When tuned in the dual space, these affine bounds smoothly rotate around the nonlinear manifold of the sigmoid activation function. This approach, termed $\alpha$-sig, allows us to tractably incorporate the tightest possible, element-wise convex relaxation of the sigmoid activation function into a formal verification framework. We embed these relaxations inside of large verification tasks and compare their performance to LiRPA and $\alpha$-CROWN, a state-of-the-art verification duo.
Related papers
- Tightening convex relaxations of trained neural networks: a unified approach for convex and S-shaped activations [0.0]
We develop a formula that yields a tight convexification for the composition of an activation with an affine function.
Our approach can be used to efficiently compute separating hyperplanes or determine that none exists in various settings.
arXiv Detail & Related papers (2024-10-30T18:09:53Z) - Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference [54.2589824716527]
Large language models incur substantial computation and memory movement costs due to their large scale.
Existing approaches separate outliers and normal values into two matrices or migrate outliers from activations to weights, suffering from high latency or accuracy degradation.
We propose Rotated Smooth (RRS), a plug-and-play activation smoother for quantization, consisting of Smooth and Rotation operation.
The proposed method outperforms the state-of-the-art method in the LLaMA and Qwen families and improves WikiText-2 perplexity from 57.33 to 6.66 for INT4 inference.
arXiv Detail & Related papers (2024-09-30T14:59:22Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Linear Oscillation: A Novel Activation Function for Vision Transformer [0.0]
We present the Linear Oscillation (LoC) activation function, defined as $f(x) = x times sin(alpha x + beta)$.
Distinct from conventional activation functions which primarily introduce non-linearity, LoC seamlessly blends linear trajectories with oscillatory deviations.
Our empirical studies reveal that, when integrated into diverse neural architectures, the LoC activation function consistently outperforms established counterparts like ReLU and Sigmoid.
arXiv Detail & Related papers (2023-08-25T20:59:51Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Neural Network Verification as Piecewise Linear Optimization:
Formulations for the Composition of Staircase Functions [2.088583843514496]
We present a technique for neural network verification using mixed-integer programming (MIP) formulations.
We derive a strong formulation for each neuron in a network using piecewise linear activation functions.
We also derive a separation procedure that runs in super-linear time in the input dimension.
arXiv Detail & Related papers (2022-11-27T03:25:48Z) - Active Nearest Neighbor Regression Through Delaunay Refinement [79.93030583257597]
We introduce an algorithm for active function approximation based on nearest neighbor regression.
Our Active Nearest Neighbor Regressor (ANNR) relies on the Voronoi-Delaunay framework from computational geometry to subdivide the space into cells with constant estimated function value.
arXiv Detail & Related papers (2022-06-16T10:24:03Z) - Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs)
Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space.
This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z) - LinSyn: Synthesizing Tight Linear Bounds for Arbitrary Neural Network
Activation Functions [4.03308187036005]
LinSyn is the first approach that achieves tight bounds for any arbitrary activation function.
We show that our approach often achieves 2-5X tighter final output bounds and more than quadruple certified robustness.
arXiv Detail & Related papers (2022-01-31T17:00:50Z) - Multiway Non-rigid Point Cloud Registration via Learned Functional Map
Synchronization [105.14877281665011]
We present SyNoRiM, a novel way to register multiple non-rigid shapes by synchronizing the maps relating learned functions defined on the point clouds.
We demonstrate via extensive experiments that our method achieves a state-of-the-art performance in registration accuracy.
arXiv Detail & Related papers (2021-11-25T02:37:59Z) - Parametric Rectified Power Sigmoid Units: Learning Nonlinear Neural
Transfer Analytical Forms [1.6975704972827304]
The paper proposes representation functionals in a dual paradigm where learning jointly concerns linear convolutional weights and parametric forms of nonlinear activation functions.
The nonlinear forms proposed for performing the functional representation are associated with a new class of parametric neural transfer functions called rectified power sigmoid units.
Performance achieved by the joint learning of convolutional and rectified power sigmoid learnable parameters are shown outstanding in both shallow and deep learning frameworks.
arXiv Detail & Related papers (2021-01-25T08:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.