Related papers: On the rate of convergence of a classifier based on a Transformer encoder

On the rate of convergence of a classifier based on a Transformer encoder

URL: http://arxiv.org/abs/2111.14574v1
Date: Mon, 29 Nov 2021 14:58:29 GMT
Title: On the rate of convergence of a classifier based on a Transformer encoder
Authors: Iryna Gurevych, Michael Kohler, G\"ozde G\"ul Sahin
Abstract summary: The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model.
Score: 55.41148606254641
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pattern recognition based on a high-dimensional predictor is considered. A classifier is defined which is based on a Transformer encoder. The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model. Furthermore, the difference between Transformer classifiers analyzed theoretically in this paper and Transformer classifiers used nowadays in practice are illustrated by considering classification problems in natural language processing.

Related papers

Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier [0.0]
Logistic regression on the output representation of the transformer neural network layer is most often used to probing the syntactic properties of the language model. We show that using gradient boosting decision trees at the Knowledge Neuron layer is more advantageous than using logistic regression on the output representations of the transformer layer.
arXiv Detail & Related papers (2023-12-17T15:37:03Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Soft-margin classification of object manifolds [0.0]
A neural population responding to multiple appearances of a single object defines a manifold in the neural response space. The ability to classify such manifold is of interest, as object recognition and other computational tasks require a response that is insensitive to variability within a manifold. Soft-margin classifiers are a larger class of algorithms and provide an additional regularization parameter used in applications to optimize performance outside the training set.
arXiv Detail & Related papers (2022-03-14T12:23:36Z)
Exploring Category-correlated Feature for Few-shot Image Classification [27.13708881431794]
We present a simple yet effective feature rectification method by exploring the category correlation between novel and base classes as the prior knowledge. The proposed approach consistently obtains considerable performance gains on three widely used benchmarks.
arXiv Detail & Related papers (2021-12-14T08:25:24Z)
Weight Vector Tuning and Asymptotic Analysis of Binary Linear Classifiers [82.5915112474988]
This paper proposes weight vector tuning of a generic binary linear classifier through the parameterization of a decomposition of the discriminant by a scalar. It is also found that weight vector tuning significantly improves the performance of Linear Discriminant Analysis (LDA) under high estimation noise.
arXiv Detail & Related papers (2021-10-01T17:50:46Z)
When in Doubt: Improving Classification Performance with Alternating Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification. CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution. We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z)
A Multiple Classifier Approach for Concatenate-Designed Neural Networks [13.017053017670467]
We give the design of the classifiers, which collects the features produced between the network sets. We use the L2 normalization method to obtain the classification score instead of the Softmax Dense. As a result, the proposed classifiers are able to improve the accuracy in the experimental cases.
arXiv Detail & Related papers (2021-01-14T04:32:40Z)
Consistency Regularization for Certified Robustness of Smoothed Classifiers [89.72878906950208]
A recent technique of randomized smoothing has shown that the worst-case $ell$-robustness can be transformed into the average-case robustness. We found that the trade-off between accuracy and certified robustness of smoothed classifiers can be greatly controlled by simply regularizing the prediction consistency over noise.
arXiv Detail & Related papers (2020-06-07T06:57:43Z)
Quantifying the Uncertainty of Precision Estimates for Rule based Text Classifiers [0.0]
Rule based classifiers that use the presence and absence of key sub-strings to make classification decisions have a natural mechanism for quantifying the uncertainty of their precision. For a binary classifier, the key insight is to treat partitions of the sub-string set induced by the documents as Bernoulli random variables. The utility of this approach is demonstrated with a benchmark problem.
arXiv Detail & Related papers (2020-05-19T03:51:47Z)
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks. We present a unifying view of randomized smoothing over arbitrary functions. We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.