All Bark and No Bite: Rogue Dimensions in Transformer Language Models
Obscure Representational Quality
- URL: http://arxiv.org/abs/2109.04404v1
- Date: Thu, 9 Sep 2021 16:45:15 GMT
- Title: All Bark and No Bite: Rogue Dimensions in Transformer Language Models
Obscure Representational Quality
- Authors: William Timkey, Marten van Schijndel
- Abstract summary: We call into question the informativity of such measures for contextualized language models.
We find that a small number of rogue dimensions, often just 1-3, dominate similarity measures.
- Score: 5.203329540700176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Similarity measures are a vital tool for understanding how language models
represent and process language. Standard representational similarity measures
such as cosine similarity and Euclidean distance have been successfully used in
static word embedding models to understand how words cluster in semantic space.
Recently, these measures have been applied to embeddings from contextualized
models such as BERT and GPT-2. In this work, we call into question the
informativity of such measures for contextualized language models. We find that
a small number of rogue dimensions, often just 1-3, dominate these measures.
Moreover, we find a striking mismatch between the dimensions that dominate
similarity measures and those which are important to the behavior of the model.
We show that simple postprocessing techniques such as standardization are able
to correct for rogue dimensions and reveal underlying representational quality.
We argue that accounting for rogue dimensions is essential for any
similarity-based analysis of contextual language models.
Related papers
- A statistically consistent measure of Semantic Variability using Language Models [3.4933610074113464]
We present a measure of semantic variability that is statistically consistent under mild assumptions.
This measure, denoted as semantic spectral entropy, is a easy to implement algorithm that requires just off the shelf language models.
arXiv Detail & Related papers (2025-02-01T17:55:58Z) - A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Crosslinguistic Case Study of Supplementary Adverbs [50.982315553104975]
Semantic map models (SMMs) construct a network-like conceptual space from cross-linguistic instances or forms.
Most SMMs are manually built by human experts using bottom-up procedures.
We propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner.
arXiv Detail & Related papers (2024-12-02T12:06:41Z) - Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis [10.242373477945376]
We use persistent homology from topological data analysis to measure the distances between language pairs from the shape of their unlabeled embeddings.
To distinguish whether these differences are random training errors or capture real information about the languages, we use the computed distance matrices to construct language phylogenetic trees over 81 Indo-European languages.
arXiv Detail & Related papers (2024-03-30T23:51:25Z) - Probing Physical Reasoning with Counter-Commonsense Context [34.8562766828087]
This study investigates how physical commonsense affects the contextualized size comparison task.
This dataset tests the ability of language models to predict the size relationship between objects under various contexts.
arXiv Detail & Related papers (2023-06-04T04:24:43Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Similarity between Units of Natural Language: The Transition from Coarse
to Fine Estimation [0.0]
Capturing the similarities between human language units is crucial for explaining how humans associate different objects.
My research goal in this thesis is to develop regression models that account for similarities between language units in a more refined way.
arXiv Detail & Related papers (2022-10-25T18:54:32Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Using Distributional Principles for the Semantic Study of Contextual
Language Models [7.284661356980247]
We first focus on these properties for English by exploiting the distributional principle of substitution as a probing mechanism in the controlled context of SemCor and WordNet paradigmatic relations.
We then propose to adapt the same method to a more open setting for characterizing the differences between static and contextual language models.
arXiv Detail & Related papers (2021-11-23T22:21:16Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.