Related papers: All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality

All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality

URL: http://arxiv.org/abs/2109.04404v1
Date: Thu, 9 Sep 2021 16:45:15 GMT
Title: All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality
Authors: William Timkey, Marten van Schijndel
Abstract summary: We call into question the informativity of such measures for contextualized language models. We find that a small number of rogue dimensions, often just 1-3, dominate similarity measures.
Score: 5.203329540700176
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Similarity measures are a vital tool for understanding how language models represent and process language. Standard representational similarity measures such as cosine similarity and Euclidean distance have been successfully used in static word embedding models to understand how words cluster in semantic space. Recently, these measures have been applied to embeddings from contextualized models such as BERT and GPT-2. In this work, we call into question the informativity of such measures for contextualized language models. We find that a small number of rogue dimensions, often just 1-3, dominate these measures. Moreover, we find a striking mismatch between the dimensions that dominate similarity measures and those which are important to the behavior of the model. We show that simple postprocessing techniques such as standardization are able to correct for rogue dimensions and reveal underlying representational quality. We argue that accounting for rogue dimensions is essential for any similarity-based analysis of contextual language models.

Related papers

Shared Global and Local Geometry of Language Model Embeddings [46.33317507982751]
We find that token embeddings of language models exhibit common geometric structure. We show that tokens with lower intrinsic dimensions often have semantically coherent clusters, while those with higher intrinsic dimensions do not. Perhaps most surprisingly, we find that alignment in token embeddings persists through the hidden states of language models.
arXiv Detail & Related papers (2025-03-27T01:17:06Z)
A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Crosslinguistic Case Study of Supplementary Adverbs [50.982315553104975]
Semantic map models (SMMs) construct a network-like conceptual space from cross-linguistic instances or forms. Most SMMs are manually built by human experts using bottom-up procedures. We propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner.
arXiv Detail & Related papers (2024-12-02T12:06:41Z)
Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs) We form "semantic tokens" by merging the semantically similar subwords and their embeddings. inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z)
The Shape of Word Embeddings: Quantifying Non-Isometry With Topological Data Analysis [10.242373477945376]
We use persistent homology from topological data analysis to measure the distances between language pairs from the shape of their unlabeled embeddings. To distinguish whether these differences are random training errors or capture real information about the languages, we use the computed distance matrices to construct language phylogenetic trees over 81 Indo-European languages.
arXiv Detail & Related papers (2024-03-30T23:51:25Z)
Probing Physical Reasoning with Counter-Commonsense Context [34.8562766828087]
This study investigates how physical commonsense affects the contextualized size comparison task. This dataset tests the ability of language models to predict the size relationship between objects under various contexts.
arXiv Detail & Related papers (2023-06-04T04:24:43Z)
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings. Our model operates on parallel data in $N$ languages. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z)
Similarity between Units of Natural Language: The Transition from Coarse to Fine Estimation [0.0]
Capturing the similarities between human language units is crucial for explaining how humans associate different objects. My research goal in this thesis is to develop regression models that account for similarities between language units in a more refined way.
arXiv Detail & Related papers (2022-10-25T18:54:32Z)
Visual Comparison of Language Model Adaptation [55.92129223662381]
adapters are lightweight alternatives for model adaptation. In this paper, we discuss several design and alternatives for interactive, comparative visual explanation methods. We show that, for instance, an adapter trained on the language debiasing task according to context-0 embeddings introduces a new type of bias.
arXiv Detail & Related papers (2022-08-17T09:25:28Z)
Exploring Dimensionality Reduction Techniques in Multilingual Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers. It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z)
Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z)
Bilingual Topic Models for Comparable Corpora [9.509416095106491]
We propose a binding mechanism between the distributions of the paired documents. To estimate the similarity of documents that are written in different languages we use cross-lingual word embeddings that are learned with shallow neural networks. We evaluate the proposed binding mechanism by extending two topic models: a bilingual adaptation of LDA that assumes bag-of-words inputs and a model that incorporates part of the text structure in the form of boundaries of semantically coherent segments.
arXiv Detail & Related papers (2021-11-30T10:53:41Z)
Using Distributional Principles for the Semantic Study of Contextual Language Models [7.284661356980247]
We first focus on these properties for English by exploiting the distributional principle of substitution as a probing mechanism in the controlled context of SemCor and WordNet paradigmatic relations. We then propose to adapt the same method to a more open setting for characterizing the differences between static and contextual language models.
arXiv Detail & Related papers (2021-11-23T22:21:16Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.