Related papers: Representations as Language: An Information-Theoretic Framework for Interpretability

Representations as Language: An Information-Theoretic Framework for Interpretability

URL: http://arxiv.org/abs/2406.02449v1
Date: Tue, 4 Jun 2024 16:14:00 GMT
Title: Representations as Language: An Information-Theoretic Framework for Interpretability
Authors: Henry Conklin, Kenny Smith,
Abstract summary: Large scale neural models show impressive performance across a wide array of linguistic tasks. Despite this they remain, largely, black-boxes, inducing vector-representations of their input that prove difficult to interpret. We introduce a novel approach to interpretability that looks at the mapping a model learns from sentences to representations as a kind of language in its own right.
Score: 7.2129390689756185
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large scale neural models show impressive performance across a wide array of linguistic tasks. Despite this they remain, largely, black-boxes - inducing vector-representations of their input that prove difficult to interpret. This limits our ability to understand what they learn, and when the learn it, or describe what kinds of representations generalise well out of distribution. To address this we introduce a novel approach to interpretability that looks at the mapping a model learns from sentences to representations as a kind of language in its own right. In doing so we introduce a set of information-theoretic measures that quantify how structured a model's representations are with respect to its input, and when during training that structure arises. Our measures are fast to compute, grounded in linguistic theory, and can predict which models will generalise best based on their representations. We use these measures to describe two distinct phases of training a transformer: an initial phase of in-distribution learning which reduces task loss, then a second stage where representations becoming robust to noise. Generalisation performance begins to increase during this second phase, drawing a link between generalisation and robustness to noise. Finally we look at how model size affects the structure of the representational space, showing that larger models ultimately compress their representations more than their smaller counterparts.

Related papers

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability [112.296393156262]
We quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations.<n>We propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability.
arXiv Detail & Related papers (2025-10-28T06:21:06Z)
A Markov Categorical Framework for Language Modeling [9.910562011343009]
Autoregressive language models achieve remarkable performance, yet a unified theory explaining their internal mechanisms, how training shapes their representations, and enables complex behaviors, remains elusive.<n>We introduce a new analytical framework that models the single-step generation process as a composition of information-processing stages using the language of Markov categories.<n>This work presents a powerful new lens for understanding how information flows through a model and how the training objective shapes its internal geometry.
arXiv Detail & Related papers (2025-07-25T13:14:03Z)
On Support Samples of Next Word Prediction [14.854557537744405]
This paper investigates emphdata-centric interpretability in language models.<n>Using representer theorem, we identify two types of emphsupport samples--those that either promote or deter specific predictions.
arXiv Detail & Related papers (2025-06-04T15:13:22Z)
Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation [3.8073142980733]
This thesis introduces quantitative methods for identifying systematic structure in a mapping between spaces.<n>I identify structural primitives present in a mapping, along with information theoretics of each.<n>I also introduce a novel, performant, approach to estimating the entropy of vector space, that allows this analysis to be applied to models ranging in size from 1 million to 12 billion parameters.
arXiv Detail & Related papers (2025-05-29T19:27:50Z)
From Text to Graph: Leveraging Graph Neural Networks for Enhanced Explainability in NLP [3.864700176441583]
This study proposes a novel methodology to achieve explainability in natural language processing tasks. It automatically converts sentences into graphs and maintains semantics through nodes and relations. Experiments delivered promising results in determining the most critical components within the text structure for a given classification.
arXiv Detail & Related papers (2025-04-02T18:55:58Z)
Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning [3.519547280344187]
We study a class of solvable contrastive self-supervised algorithms which we term quadratic word embedding models. Our solutions reveal that these models learn linear subspaces one at a time, each one incrementing the effective rank of the embeddings until model capacity is saturated. We use our dynamical theory to predict how and when models acquire the ability to complete analogies.
arXiv Detail & Related papers (2025-02-14T02:16:48Z)
ICLR: In-Context Learning of Representations [19.331483579806623]
We show that as the amount of context is scaled, there is a sudden re-organization from pretrained semantic representations to in-context representations aligned with the graph structure. Our findings indicate scaling context-size can flexibly re-organize model representations, possibly unlocking novel capabilities.
arXiv Detail & Related papers (2024-12-29T18:58:09Z)
Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme. We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language. We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z)
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models. We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z)
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z)
Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z)
Bidirectional Representations for Low Resource Spoken Language Understanding [39.208462511430554]
We propose a representation model to encode speech in bidirectional rich encodings. The approach uses a masked language modelling objective to learn the representations. We show that the performance of the resulting encodings is better than comparable models on multiple datasets.
arXiv Detail & Related papers (2022-11-24T17:05:16Z)
High-dimensional distributed semantic spaces for utterances [0.2907403645801429]
This paper describes a model for high-dimensional representation for utterance and text level data. It is based on a mathematically principled and behaviourally plausible approach to representing linguistic information. The paper shows how the implemented model is able to represent a broad range of linguistic features in a common integral framework of fixed dimensionality.
arXiv Detail & Related papers (2021-04-01T12:09:47Z)
Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data. We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z)
SLM: Learning a Discourse Language Representation with Sentence Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation. We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.