On the rate of convergence of a classifier based on a Transformer
encoder
- URL: http://arxiv.org/abs/2111.14574v1
- Date: Mon, 29 Nov 2021 14:58:29 GMT
- Title: On the rate of convergence of a classifier based on a Transformer
encoder
- Authors: Iryna Gurevych, Michael Kohler, G\"ozde G\"ul Sahin
- Abstract summary: The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed.
It is shown that this classifier is able to circumvent the curse of dimensionality provided the aposteriori probability satisfies a suitable hierarchical composition model.
- Score: 55.41148606254641
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pattern recognition based on a high-dimensional predictor is considered. A
classifier is defined which is based on a Transformer encoder. The rate of
convergence of the misclassification probability of the classifier towards the
optimal misclassification probability is analyzed. It is shown that this
classifier is able to circumvent the curse of dimensionality provided the
aposteriori probability satisfies a suitable hierarchical composition model.
Furthermore, the difference between Transformer classifiers analyzed
theoretically in this paper and Transformer classifiers used nowadays in
practice are illustrated by considering classification problems in natural
language processing.
Related papers
- Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons
as Probing Classifier [0.0]
Logistic regression on the output representation of the transformer neural network layer is most often used to probing the syntactic properties of the language model.
We show that using gradient boosting decision trees at the Knowledge Neuron layer is more advantageous than using logistic regression on the output representations of the transformer layer.
arXiv Detail & Related papers (2023-12-17T15:37:03Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Soft-margin classification of object manifolds [0.0]
A neural population responding to multiple appearances of a single object defines a manifold in the neural response space.
The ability to classify such manifold is of interest, as object recognition and other computational tasks require a response that is insensitive to variability within a manifold.
Soft-margin classifiers are a larger class of algorithms and provide an additional regularization parameter used in applications to optimize performance outside the training set.
arXiv Detail & Related papers (2022-03-14T12:23:36Z) - Exploring Category-correlated Feature for Few-shot Image Classification [27.13708881431794]
We present a simple yet effective feature rectification method by exploring the category correlation between novel and base classes as the prior knowledge.
The proposed approach consistently obtains considerable performance gains on three widely used benchmarks.
arXiv Detail & Related papers (2021-12-14T08:25:24Z) - Weight Vector Tuning and Asymptotic Analysis of Binary Linear
Classifiers [82.5915112474988]
This paper proposes weight vector tuning of a generic binary linear classifier through the parameterization of a decomposition of the discriminant by a scalar.
It is also found that weight vector tuning significantly improves the performance of Linear Discriminant Analysis (LDA) under high estimation noise.
arXiv Detail & Related papers (2021-10-01T17:50:46Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - A Multiple Classifier Approach for Concatenate-Designed Neural Networks [13.017053017670467]
We give the design of the classifiers, which collects the features produced between the network sets.
We use the L2 normalization method to obtain the classification score instead of the Softmax Dense.
As a result, the proposed classifiers are able to improve the accuracy in the experimental cases.
arXiv Detail & Related papers (2021-01-14T04:32:40Z) - Consistency Regularization for Certified Robustness of Smoothed
Classifiers [89.72878906950208]
A recent technique of randomized smoothing has shown that the worst-case $ell$-robustness can be transformed into the average-case robustness.
We found that the trade-off between accuracy and certified robustness of smoothed classifiers can be greatly controlled by simply regularizing the prediction consistency over noise.
arXiv Detail & Related papers (2020-06-07T06:57:43Z) - Quantifying the Uncertainty of Precision Estimates for Rule based Text
Classifiers [0.0]
Rule based classifiers that use the presence and absence of key sub-strings to make classification decisions have a natural mechanism for quantifying the uncertainty of their precision.
For a binary classifier, the key insight is to treat partitions of the sub-string set induced by the documents as Bernoulli random variables.
The utility of this approach is demonstrated with a benchmark problem.
arXiv Detail & Related papers (2020-05-19T03:51:47Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.