Convex Bounds on the Softmax Function with Applications to Robustness
Verification
- URL: http://arxiv.org/abs/2303.01713v1
- Date: Fri, 3 Mar 2023 05:07:02 GMT
- Title: Convex Bounds on the Softmax Function with Applications to Robustness
Verification
- Authors: Dennis Wei, Haoze Wu, Min Wu, Pin-Yu Chen, Clark Barrett, Eitan Farchi
- Abstract summary: The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well.
This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models.
- Score: 69.09991317119679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The softmax function is a ubiquitous component at the output of neural
networks and increasingly in intermediate layers as well. This paper provides
convex lower bounds and concave upper bounds on the softmax function, which are
compatible with convex optimization formulations for characterizing neural
networks and other ML models. We derive bounds using both a natural
exponential-reciprocal decomposition of the softmax as well as an alternative
decomposition in terms of the log-sum-exp function. The new bounds are provably
and/or numerically tighter than linear bounds obtained in previous work on
robustness verification of transformers. As illustrations of the utility of the
bounds, we apply them to verification of transformers as well as of the
robustness of predictive uncertainty estimates of deep ensembles.
Related papers
- Moreau Envelope ADMM for Decentralized Weakly Convex Optimization [55.2289666758254]
This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization.
The results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
arXiv Detail & Related papers (2023-08-31T14:16:30Z) - Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks.
Existing methods are either theoretically flawed or empirically ineffective for visual recognition.
We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z) - Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation [2.3813678058429626]
The softmax function is widely used in artificial neural networks for the multiclass classification problems.
In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems.
arXiv Detail & Related papers (2021-12-23T09:53:38Z) - SOFT: Softmax-free Transformer with Linear Complexity [112.9754491864247]
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention.
Various attempts on approximating the self-attention with linear complexity have been made in Natural Language Processing.
We identify that their limitations are rooted in keeping the softmax self-attention during approximations.
For the first time, a softmax-free transformer or SOFT is proposed.
arXiv Detail & Related papers (2021-10-22T17:57:29Z) - Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in
Attention Mechanism [8.007523868483085]
Softmax is widely used in neural networks for multiclass classification, gate structure and attention mechanisms.
In this work, we suggest that replacing the exponential function by periodic functions, and we delve into some potential periodic alternatives of Softmax.
Our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants.
arXiv Detail & Related papers (2021-08-16T15:26:31Z) - Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks [0.0]
We prove sharp lower bounds on the approximation rates for shallow neural networks.
These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU.
arXiv Detail & Related papers (2021-06-28T22:01:42Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions [73.33961743410876]
A soft-max function has two main efficiency measures: approximation and smoothness.
We identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness.
This leads to novel soft-max functions, each of which is optimal for a different application.
arXiv Detail & Related papers (2020-10-22T05:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.