Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions
- URL: http://arxiv.org/abs/2010.11450v1
- Date: Thu, 22 Oct 2020 05:19:58 GMT
- Title: Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions
- Authors: Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, Manolis
Zampetakis
- Abstract summary: A soft-max function has two main efficiency measures: approximation and smoothness.
We identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness.
This leads to novel soft-max functions, each of which is optimal for a different application.
- Score: 73.33961743410876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A soft-max function has two main efficiency measures: (1) approximation -
which corresponds to how well it approximates the maximum function, (2)
smoothness - which shows how sensitive it is to changes of its input. Our goal
is to identify the optimal approximation-smoothness tradeoffs for different
measures of approximation and smoothness. This leads to novel soft-max
functions, each of which is optimal for a different application. The most
commonly used soft-max function, called exponential mechanism, has optimal
tradeoff between approximation measured in terms of expected additive
approximation and smoothness measured with respect to R\'enyi Divergence. We
introduce a soft-max function, called "piecewise linear soft-max", with optimal
tradeoff between approximation, measured in terms of worst-case additive
approximation and smoothness, measured with respect to $\ell_q$-norm. The
worst-case approximation guarantee of the piecewise linear mechanism enforces
sparsity in the output of our soft-max function, a property that is known to be
important in Machine Learning applications [Martins et al. '16, Laha et al.
'18] and is not satisfied by the exponential mechanism. Moreover, the
$\ell_q$-smoothness is suitable for applications in Mechanism Design and Game
Theory where the piecewise linear mechanism outperforms the exponential
mechanism. Finally, we investigate another soft-max function, called power
mechanism, with optimal tradeoff between expected \textit{multiplicative}
approximation and smoothness with respect to the R\'enyi Divergence, which
provides improved theoretical and practical results in differentially private
submodular optimization.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Covariance Matrix Adaptation Evolutionary Strategy with Worst-Case
Ranking Approximation for Min--Max Optimization and its Application to
Berthing Control Tasks [19.263468901608785]
We consider a continuous min--max optimization problem $min_x in mathbbX max_y in mathbbYf(x,y)$ whose objective function is a black-box.
We propose a novel approach to minimize the worst-case objective function $F(x) = max_y f(x,y)$ directly.
arXiv Detail & Related papers (2023-03-28T15:50:56Z) - Inference on Optimal Dynamic Policies via Softmax Approximation [27.396891119011215]
We show that a simple soft-max approximation to the optimal treatment regime can achieve valid inference on the truly optimal regime.
Our work combines techniques from semi-parametric inference and $g$-estimation, together with an appropriate array central limit theorem.
arXiv Detail & Related papers (2023-03-08T07:42:47Z) - Convex Bounds on the Softmax Function with Applications to Robustness
Verification [69.09991317119679]
The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well.
This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models.
arXiv Detail & Related papers (2023-03-03T05:07:02Z) - Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks.
Existing methods are either theoretically flawed or empirically ineffective for visual recognition.
We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z) - Non-Convex Optimization with Certificates and Fast Rates Through Kernel
Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems.
In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z) - Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation [2.3813678058429626]
The softmax function is widely used in artificial neural networks for the multiclass classification problems.
In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems.
arXiv Detail & Related papers (2021-12-23T09:53:38Z) - Stabilizing Q Learning Via Soft Mellowmax Operator [12.208344427928466]
Mellowmax is a proposed differentiable and non-expansion softmax operator that allows a convergent behavior in learning and planning.
We show that our SM2 operator can be applied to the challenging multi-agent reinforcement learning scenarios, leading to stable value function approximation and state of the art performance.
arXiv Detail & Related papers (2020-12-17T09:11:13Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.