Related papers: Convex Bounds on the Softmax Function with Applications to Robustness Verification

Convex Bounds on the Softmax Function with Applications to Robustness Verification

URL: http://arxiv.org/abs/2303.01713v1
Date: Fri, 3 Mar 2023 05:07:02 GMT
Title: Convex Bounds on the Softmax Function with Applications to Robustness Verification
Authors: Dennis Wei, Haoze Wu, Min Wu, Pin-Yu Chen, Clark Barrett, Eitan Farchi
Abstract summary: The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well. This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models.
Score: 69.09991317119679
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well. This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models. We derive bounds using both a natural exponential-reciprocal decomposition of the softmax as well as an alternative decomposition in terms of the log-sum-exp function. The new bounds are provably and/or numerically tighter than linear bounds obtained in previous work on robustness verification of transformers. As illustrations of the utility of the bounds, we apply them to verification of transformers as well as of the robustness of predictive uncertainty estimates of deep ensembles.

Related papers

Self-Adjust Softmax [62.267367768385434]
The softmax function is crucial in Transformer attention, which normalizes each row of the attention scores with summation to one. We propose Self-Adjust Softmax (SA-Softmax) to address this issue by modifying $softmax(x)$ to $x cdot softmax(x)$ and its normalized variant $frac(x - min(x_min,0))max(0,x_max)-min(x_min,0) cdot softmax(x)$.
arXiv Detail & Related papers (2025-02-25T15:07:40Z)
Moreau Envelope ADMM for Decentralized Weakly Convex Optimization [55.2289666758254]
This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. The results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
arXiv Detail & Related papers (2023-08-31T14:16:30Z)
Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks. Existing methods are either theoretically flawed or empirically ineffective for visual recognition. We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z)
Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation [2.3813678058429626]
The softmax function is widely used in artificial neural networks for the multiclass classification problems. In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems.
arXiv Detail & Related papers (2021-12-23T09:53:38Z)
SOFT: Softmax-free Transformer with Linear Complexity [112.9754491864247]
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. Various attempts on approximating the self-attention with linear complexity have been made in Natural Language Processing. We identify that their limitations are rooted in keeping the softmax self-attention during approximations. For the first time, a softmax-free transformer or SOFT is proposed.
arXiv Detail & Related papers (2021-10-22T17:57:29Z)
Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism [8.007523868483085]
Softmax is widely used in neural networks for multiclass classification, gate structure and attention mechanisms. In this work, we suggest that replacing the exponential function by periodic functions, and we delve into some potential periodic alternatives of Softmax. Our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants.
arXiv Detail & Related papers (2021-08-16T15:26:31Z)
Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks [0.0]
We prove sharp lower bounds on the approximation rates for shallow neural networks. These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU.
arXiv Detail & Related papers (2021-06-28T22:01:42Z)
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point. The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions [73.33961743410876]
A soft-max function has two main efficiency measures: approximation and smoothness. We identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. This leads to novel soft-max functions, each of which is optimal for a different application.
arXiv Detail & Related papers (2020-10-22T05:19:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.