On Isotropy Calibration of Transformers
- URL: http://arxiv.org/abs/2109.13304v1
- Date: Mon, 27 Sep 2021 18:54:10 GMT
- Title: On Isotropy Calibration of Transformers
- Authors: Yue Ding, Karolis Martinkus, Damian Pascual, Simon Clematide, Roger
Wattenhofer
- Abstract summary: Studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic.
A recent study shows that the embedding space of transformers is locally isotropic, which suggests that these models are already capable of exploiting the expressive capacity of their embedding space.
We conduct an empirical evaluation of state-of-the-art methods for isotropy calibration on transformers and find that they do not provide consistent improvements across models and tasks.
- Score: 10.294618771570985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Different studies of the embedding space of transformer models suggest that
the distribution of contextual representations is highly anisotropic - the
embeddings are distributed in a narrow cone. Meanwhile, static word
representations (e.g., Word2Vec or GloVe) have been shown to benefit from
isotropic spaces. Therefore, previous work has developed methods to calibrate
the embedding space of transformers in order to ensure isotropy. However, a
recent study (Cai et al. 2021) shows that the embedding space of transformers
is locally isotropic, which suggests that these models are already capable of
exploiting the expressive capacity of their embedding space. In this work, we
conduct an empirical evaluation of state-of-the-art methods for isotropy
calibration on transformers and find that they do not provide consistent
improvements across models and tasks. These results support the thesis that,
given the local isotropy, transformers do not benefit from additional isotropy
calibration.
Related papers
- Identification of Mean-Field Dynamics using Transformers [3.8916312075738273]
This paper investigates the use of transformer architectures to approximate the mean-field dynamics of particle systems exhibiting collective behavior.
Specifically, we prove that if a finite-dimensional transformer can effectively approximate the finite-dimensional vector field governing the particle system, then the expected output of this transformer provides a good approximation for the infinite-dimensional mean-field vector field.
arXiv Detail & Related papers (2024-10-06T19:47:24Z) - Transformers Handle Endogeneity in In-Context Linear Regression [34.458004744956334]
We show that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV)
We propose an in-context pretraining scheme and provide theoretical guarantees showing that the global minimizer of the pre-training loss achieves a small excess loss.
arXiv Detail & Related papers (2024-10-02T06:21:04Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Transformers can optimally learn regression mixture models [22.85684729248361]
We show that transformers can learn an optimal predictor for mixtures of regressions.
Experiments also demonstrate that transformers can learn mixtures of regressions in a sample-efficient fashion.
We prove constructively that the decision-theoretic optimal procedure is indeed implementable by a transformer.
arXiv Detail & Related papers (2023-11-14T18:09:15Z) - All Roads Lead to Rome? Exploring the Invariance of Transformers'
Representations [69.3461199976959]
We propose a model based on invertible neural networks, BERT-INN, to learn the Bijection Hypothesis.
We show the advantage of BERT-INN both theoretically and through extensive experiments.
arXiv Detail & Related papers (2023-05-23T22:30:43Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - XAI for Transformers: Better Explanations through Conservative
Propagation [60.67748036747221]
We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction.
Our proposal can be seen as a proper extension of the well-established LRP method to Transformers.
arXiv Detail & Related papers (2022-02-15T10:47:11Z) - Pathologies in priors and inference for Bayesian transformers [71.97183475225215]
No successful attempts to improve transformer models in terms of predictive uncertainty using Bayesian inference exist.
We find that weight-space inference in transformers does not work well, regardless of the approximate posterior.
We propose a novel method based on the implicit reparameterization of the Dirichlet distribution to apply variational inference directly to the attention weights.
arXiv Detail & Related papers (2021-10-08T10:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.