Enhancing Classifier Conservativeness and Robustness by Polynomiality
- URL: http://arxiv.org/abs/2203.12693v1
- Date: Wed, 23 Mar 2022 19:36:19 GMT
- Title: Enhancing Classifier Conservativeness and Robustness by Polynomiality
- Authors: Ziqi Wang, Marco Loog
- Abstract summary: We show howconditionality can remedy the situation.
A directly related, simple, yet important technical novelty we subsequently present is softRmax.
We show that two aspects of softRmax, conservativeness and inherent robustness, lead to adversarial regularization.
- Score: 23.099278014212146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We illustrate the detrimental effect, such as overconfident decisions, that
exponential behavior can have in methods like classical LDA and logistic
regression. We then show how polynomiality can remedy the situation. This,
among others, leads purposefully to random-level performance in the tails, away
from the bulk of the training data. A directly related, simple, yet important
technical novelty we subsequently present is softRmax: a reasoned alternative
to the standard softmax function employed in contemporary (deep) neural
networks. It is derived through linking the standard softmax to Gaussian
class-conditional models, as employed in LDA, and replacing those by a
polynomial alternative. We show that two aspects of softRmax, conservativeness
and inherent gradient regularization, lead to robustness against adversarial
attacks without gradient obfuscation.
Related papers
- A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Bridging Discrete and Backpropagation: Straight-Through and Beyond [62.46558842476455]
We propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables.
We propose ReinMax, which achieves second-order accuracy by integrating Heun's method, a second-order numerical method for solving ODEs.
arXiv Detail & Related papers (2023-04-17T20:59:49Z) - r-softmax: Generalized Softmax with Controllable Sparsity Rate [11.39524236962986]
We propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate.
We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax.
arXiv Detail & Related papers (2023-04-11T14:28:29Z) - Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks.
Existing methods are either theoretically flawed or empirically ineffective for visual recognition.
We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z) - Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation [2.3813678058429626]
The softmax function is widely used in artificial neural networks for the multiclass classification problems.
In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems.
arXiv Detail & Related papers (2021-12-23T09:53:38Z) - Breaking the Softmax Bottleneck for Sequential Recommender Systems with
Dropout and Decoupling [0.0]
We show that there are more aspects to the Softmax bottleneck in SBRSs.
We propose a simple yet effective method, Dropout and Decoupling (D&D), to alleviate these problems.
Our method significantly improves the accuracy of a variety of Softmax-based SBRS algorithms.
arXiv Detail & Related papers (2021-10-11T16:52:23Z) - Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in
Attention Mechanism [8.007523868483085]
Softmax is widely used in neural networks for multiclass classification, gate structure and attention mechanisms.
In this work, we suggest that replacing the exponential function by periodic functions, and we delve into some potential periodic alternatives of Softmax.
Our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants.
arXiv Detail & Related papers (2021-08-16T15:26:31Z) - Sparse Attention with Linear Units [60.399814410157425]
We introduce a novel, simple method for achieving sparsity in attention: we replace the softmax activation with a ReLU.
Our model, which we call Rectified Linear Attention (ReLA), is easy to implement and more efficient than previously proposed sparse attention mechanisms.
Our analysis shows that ReLA delivers high sparsity rate and head diversity, and the induced cross attention achieves better accuracy with respect to source-target word alignment.
arXiv Detail & Related papers (2021-04-14T17:52:38Z) - Meta-Solver for Neural Ordinary Differential Equations [77.8918415523446]
We investigate how the variability in solvers' space can improve neural ODEs performance.
We show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks.
arXiv Detail & Related papers (2021-03-15T17:26:34Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Maximin Optimization for Binary Regression [24.351803097593887]
regression problems with binary weights are ubiquitous in quantized learning models and digital communication systems.
Lagrangran method also performs well in regression with cross entropy loss, as well as non- neural multi-layer saddle-point optimization.
arXiv Detail & Related papers (2020-10-10T19:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.