r-softmax: Generalized Softmax with Controllable Sparsity Rate
- URL: http://arxiv.org/abs/2304.05243v3
- Date: Fri, 21 Apr 2023 14:41:43 GMT
- Title: r-softmax: Generalized Softmax with Controllable Sparsity Rate
- Authors: Klaudia Ba{\l}azy, {\L}ukasz Struski, Marek \'Smieja, Jacek Tabor
- Abstract summary: We propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate.
We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax.
- Score: 11.39524236962986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays artificial neural network models achieve remarkable results in many
disciplines. Functions mapping the representation provided by the model to the
probability distribution are the inseparable aspect of deep learning solutions.
Although softmax is a commonly accepted probability mapping function in the
machine learning community, it cannot return sparse outputs and always spreads
the positive probability to all positions. In this paper, we propose r-softmax,
a modification of the softmax, outputting sparse probability distribution with
controllable sparsity rate. In contrast to the existing sparse probability
mapping functions, we provide an intuitive mechanism for controlling the output
sparsity level. We show on several multi-label datasets that r-softmax
outperforms other sparse alternatives to softmax and is highly competitive with
the original softmax. We also apply r-softmax to the self-attention module of a
pre-trained transformer language model and demonstrate that it leads to
improved performance when fine-tuning the model on different natural language
processing tasks.
Related papers
- MultiMax: Sparse and Multi-Modal Attention Learning [60.49318008131978]
SoftMax is a ubiquitous ingredient of modern machine learning algorithms.
We show that sparsity can be achieved by a family of SoftMax variants, but they often require an alternative loss function and do not preserve multi-modality.
We propose MultiMax, which adaptively modulates the output distribution according to input entry range.
arXiv Detail & Related papers (2024-06-03T10:51:43Z) - Binary Hypothesis Testing for Softmax Models and Leverage Score Models [8.06972158448711]
We consider the problem of binary hypothesis testing in the setting of softmax models.
We draw analogy between the softmax model and the leverage score model.
arXiv Detail & Related papers (2024-05-09T15:56:29Z) - Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification [4.813254903898101]
logistic-softmax is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classification.
We revisit and redesign the logistic-softmax likelihood, which enables control of the textita priori confidence level through a temperature parameter.
Our approach yields well-calibrated uncertainty estimates and achieves comparable or superior results on standard benchmark datasets.
arXiv Detail & Related papers (2023-10-16T13:20:13Z) - Spectral Aware Softmax for Visible-Infrared Person Re-Identification [123.69049942659285]
Visible-infrared person re-identification (VI-ReID) aims to match specific pedestrian images from different modalities.
Existing methods still follow the softmax loss training paradigm, which is widely used in single-modality classification tasks.
We propose the spectral-aware softmax (SA-Softmax) loss, which can fully explore the embedding space with the modality information.
arXiv Detail & Related papers (2023-02-03T02:57:18Z) - Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks.
Existing methods are either theoretically flawed or empirically ineffective for visual recognition.
We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z) - Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation [2.3813678058429626]
The softmax function is widely used in artificial neural networks for the multiclass classification problems.
In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems.
arXiv Detail & Related papers (2021-12-23T09:53:38Z) - SOFT: Softmax-free Transformer with Linear Complexity [112.9754491864247]
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention.
Various attempts on approximating the self-attention with linear complexity have been made in Natural Language Processing.
We identify that their limitations are rooted in keeping the softmax self-attention during approximations.
For the first time, a softmax-free transformer or SOFT is proposed.
arXiv Detail & Related papers (2021-10-22T17:57:29Z) - Breaking the Softmax Bottleneck for Sequential Recommender Systems with
Dropout and Decoupling [0.0]
We show that there are more aspects to the Softmax bottleneck in SBRSs.
We propose a simple yet effective method, Dropout and Decoupling (D&D), to alleviate these problems.
Our method significantly improves the accuracy of a variety of Softmax-based SBRS algorithms.
arXiv Detail & Related papers (2021-10-11T16:52:23Z) - Sparse Attention with Linear Units [60.399814410157425]
We introduce a novel, simple method for achieving sparsity in attention: we replace the softmax activation with a ReLU.
Our model, which we call Rectified Linear Attention (ReLA), is easy to implement and more efficient than previously proposed sparse attention mechanisms.
Our analysis shows that ReLA delivers high sparsity rate and head diversity, and the induced cross attention achieves better accuracy with respect to source-target word alignment.
arXiv Detail & Related papers (2021-04-14T17:52:38Z) - Gradient Estimation with Stochastic Softmax Tricks [84.68686389163153]
We introduce softmax tricks, which generalize the Gumbel-Softmax trick to spaces.
We find that softmax tricks can be used to train latent variable models that perform better and discover more latent structure.
arXiv Detail & Related papers (2020-06-15T00:43:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.