Exploring Alternatives to Softmax Function
- URL: http://arxiv.org/abs/2011.11538v1
- Date: Mon, 23 Nov 2020 16:50:18 GMT
- Title: Exploring Alternatives to Softmax Function
- Authors: Kunal Banerjee, Vishak Prasad C, Rishi Raj Gupta, Karthik Vyas,
Anushree H, Biswajit Mishra
- Abstract summary: We investigate Taylor softmax, SM-softmax and our proposed SM-Taylor softmax as alternatives to softmax function.
Our experiments for the image classification task on different datasets reveal that there is always a configuration of the SM-Taylor softmax function that outperforms the normal softmax function.
- Score: 0.5924831288313849
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Softmax function is widely used in artificial neural networks for multiclass
classification, multilabel classification, attention mechanisms, etc. However,
its efficacy is often questioned in literature. The log-softmax loss has been
shown to belong to a more generic class of loss functions, called spherical
family, and its member log-Taylor softmax loss is arguably the best alternative
in this class. In another approach which tries to enhance the discriminative
nature of the softmax function, soft-margin softmax (SM-softmax) has been
proposed to be the most suitable alternative. In this work, we investigate
Taylor softmax, SM-softmax and our proposed SM-Taylor softmax, an amalgamation
of the earlier two functions, as alternatives to softmax function. Furthermore,
we explore the effect of expanding Taylor softmax up to ten terms (original
work proposed expanding only to two terms) along with the ramifications of
considering Taylor softmax to be a finite or infinite series during
backpropagation. Our experiments for the image classification task on different
datasets reveal that there is always a configuration of the SM-Taylor softmax
function that outperforms the normal softmax function and its other
alternatives.
Related papers
- MultiMax: Sparse and Multi-Modal Attention Learning [60.49318008131978]
SoftMax is a ubiquitous ingredient of modern machine learning algorithms.
We show that sparsity can be achieved by a family of SoftMax variants, but they often require an alternative loss function and do not preserve multi-modality.
We propose MultiMax, which adaptively modulates the output distribution according to input entry range.
arXiv Detail & Related papers (2024-06-03T10:51:43Z) - r-softmax: Generalized Softmax with Controllable Sparsity Rate [11.39524236962986]
We propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate.
We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax.
arXiv Detail & Related papers (2023-04-11T14:28:29Z) - Convex Bounds on the Softmax Function with Applications to Robustness
Verification [69.09991317119679]
The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well.
This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models.
arXiv Detail & Related papers (2023-03-03T05:07:02Z) - Spectral Aware Softmax for Visible-Infrared Person Re-Identification [123.69049942659285]
Visible-infrared person re-identification (VI-ReID) aims to match specific pedestrian images from different modalities.
Existing methods still follow the softmax loss training paradigm, which is widely used in single-modality classification tasks.
We propose the spectral-aware softmax (SA-Softmax) loss, which can fully explore the embedding space with the modality information.
arXiv Detail & Related papers (2023-02-03T02:57:18Z) - Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks.
Existing methods are either theoretically flawed or empirically ineffective for visual recognition.
We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z) - Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation [2.3813678058429626]
The softmax function is widely used in artificial neural networks for the multiclass classification problems.
In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems.
arXiv Detail & Related papers (2021-12-23T09:53:38Z) - Breaking the Softmax Bottleneck for Sequential Recommender Systems with
Dropout and Decoupling [0.0]
We show that there are more aspects to the Softmax bottleneck in SBRSs.
We propose a simple yet effective method, Dropout and Decoupling (D&D), to alleviate these problems.
Our method significantly improves the accuracy of a variety of Softmax-based SBRS algorithms.
arXiv Detail & Related papers (2021-10-11T16:52:23Z) - Sparse Attention with Linear Units [60.399814410157425]
We introduce a novel, simple method for achieving sparsity in attention: we replace the softmax activation with a ReLU.
Our model, which we call Rectified Linear Attention (ReLA), is easy to implement and more efficient than previously proposed sparse attention mechanisms.
Our analysis shows that ReLA delivers high sparsity rate and head diversity, and the induced cross attention achieves better accuracy with respect to source-target word alignment.
arXiv Detail & Related papers (2021-04-14T17:52:38Z) - Stabilizing Q Learning Via Soft Mellowmax Operator [12.208344427928466]
Mellowmax is a proposed differentiable and non-expansion softmax operator that allows a convergent behavior in learning and planning.
We show that our SM2 operator can be applied to the challenging multi-agent reinforcement learning scenarios, leading to stable value function approximation and state of the art performance.
arXiv Detail & Related papers (2020-12-17T09:11:13Z) - Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions [73.33961743410876]
A soft-max function has two main efficiency measures: approximation and smoothness.
We identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness.
This leads to novel soft-max functions, each of which is optimal for a different application.
arXiv Detail & Related papers (2020-10-22T05:19:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.