All Bark and No Bite: Rogue Dimensions in Transformer Language Models
Obscure Representational Quality
- URL: http://arxiv.org/abs/2109.04404v1
- Date: Thu, 9 Sep 2021 16:45:15 GMT
- Title: All Bark and No Bite: Rogue Dimensions in Transformer Language Models
Obscure Representational Quality
- Authors: William Timkey, Marten van Schijndel
- Abstract summary: We call into question the informativity of such measures for contextualized language models.
We find that a small number of rogue dimensions, often just 1-3, dominate similarity measures.
- Score: 5.203329540700176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Similarity measures are a vital tool for understanding how language models
represent and process language. Standard representational similarity measures
such as cosine similarity and Euclidean distance have been successfully used in
static word embedding models to understand how words cluster in semantic space.
Recently, these measures have been applied to embeddings from contextualized
models such as BERT and GPT-2. In this work, we call into question the
informativity of such measures for contextualized language models. We find that
a small number of rogue dimensions, often just 1-3, dominate these measures.
Moreover, we find a striking mismatch between the dimensions that dominate
similarity measures and those which are important to the behavior of the model.
We show that simple postprocessing techniques such as standardization are able
to correct for rogue dimensions and reveal underlying representational quality.
We argue that accounting for rogue dimensions is essential for any
similarity-based analysis of contextual language models.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Investigating semantic subspaces of Transformer sentence embeddings
through linear structural probing [2.5002227227256864]
We present experiments with semantic structural probing, a method for studying sentence-level representations.
We apply our method to language models from different families (encoder-only, decoder-only, encoder-decoder) and of different sizes in the context of two tasks.
We find that model families differ substantially in their performance and layer dynamics, but that the results are largely model-size invariant.
arXiv Detail & Related papers (2023-10-18T12:32:07Z) - Probing Physical Reasoning with Counter-Commonsense Context [34.8562766828087]
This study investigates how physical commonsense affects the contextualized size comparison task.
This dataset tests the ability of language models to predict the size relationship between objects under various contexts.
arXiv Detail & Related papers (2023-06-04T04:24:43Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Similarity between Units of Natural Language: The Transition from Coarse
to Fine Estimation [0.0]
Capturing the similarities between human language units is crucial for explaining how humans associate different objects.
My research goal in this thesis is to develop regression models that account for similarities between language units in a more refined way.
arXiv Detail & Related papers (2022-10-25T18:54:32Z) - Visual Comparison of Language Model Adaptation [55.92129223662381]
adapters are lightweight alternatives for model adaptation.
In this paper, we discuss several design and alternatives for interactive, comparative visual explanation methods.
We show that, for instance, an adapter trained on the language debiasing task according to context-0 embeddings introduces a new type of bias.
arXiv Detail & Related papers (2022-08-17T09:25:28Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Bilingual Topic Models for Comparable Corpora [9.509416095106491]
We propose a binding mechanism between the distributions of the paired documents.
To estimate the similarity of documents that are written in different languages we use cross-lingual word embeddings that are learned with shallow neural networks.
We evaluate the proposed binding mechanism by extending two topic models: a bilingual adaptation of LDA that assumes bag-of-words inputs and a model that incorporates part of the text structure in the form of boundaries of semantically coherent segments.
arXiv Detail & Related papers (2021-11-30T10:53:41Z) - Using Distributional Principles for the Semantic Study of Contextual
Language Models [7.284661356980247]
We first focus on these properties for English by exploiting the distributional principle of substitution as a probing mechanism in the controlled context of SemCor and WordNet paradigmatic relations.
We then propose to adapt the same method to a more open setting for characterizing the differences between static and contextual language models.
arXiv Detail & Related papers (2021-11-23T22:21:16Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.