Language Models are Injective and Hence Invertible
- URL: http://arxiv.org/abs/2510.15511v3
- Date: Tue, 21 Oct 2025 14:44:49 GMT
- Title: Language Models are Injective and Hence Invertible
- Authors: Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Yannis Panagakis, Emanuele RodolĂ ,
- Abstract summary: Transformer components such as non-linear activations and normalization are inherently non-injective.<n>We prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective.<n>We introduce SipIt, the first algorithm that provably and efficiently reconstructs the exact input text from hidden activations.
- Score: 26.862644074381844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model's representations. In this paper, we challenge this view. First, we prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective and therefore lossless, a property established at initialization and preserved during training. Second, we confirm this result empirically through billions of collision tests on six state-of-the-art language models, and observe no collisions. Third, we operationalize injectivity: we introduce SipIt, the first algorithm that provably and efficiently reconstructs the exact input text from hidden activations, establishing linear-time guarantees and demonstrating exact invertibility in practice. Overall, our work establishes injectivity as a fundamental and exploitable property of language models, with direct implications for transparency, interpretability, and safe deployment.
Related papers
- On the Identifiability of Steering Vectors in Large Language Models [0.0]
Activation steering methods are widely used to control large language model behavior.<n>This interpretation implicitly assumes steering directions are identifiable and uniquely recoverable from input-output behavior.<n>We prove that steering vectors are fundamentally non-identifiable due to large equivalence classes of behaviorally indistinguishable interventions.
arXiv Detail & Related papers (2026-02-06T15:53:50Z) - Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components.
Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z) - BBScoreV2: Learning Time-Evolution and Latent Alignment from Stochastic Representation [23.765789561546715]
Autoregressive generative models play a key role in various language tasks, especially for modeling and evaluating long text sequences.<n>In this work, we observe that fitting transformer-based model embeddings into a process yields ordered latent representations from originally unordered model outputs.<n>We introduce a novel likelihood-based evaluation metric BBVScore2, offering both intuitive and quantitative support for the effectiveness of BBV2.
arXiv Detail & Related papers (2024-05-28T02:33:38Z) - Prototype Generation: Robust Feature Visualisation for Data Independent
Interpretability [1.223779595809275]
Prototype Generation is a stricter and more robust form of feature visualisation for model-agnostic, data-independent interpretability of image classification models.
We demonstrate its ability to generate inputs that result in natural activation paths, countering previous claims that feature visualisation algorithms are untrustworthy due to the unnatural internal activations.
arXiv Detail & Related papers (2023-09-29T11:16:06Z) - Token-wise Decomposition of Autoregressive Language Model Hidden States
for Analyzing Model Predictions [9.909170013118775]
This work presents a linear decomposition of final hidden states from autoregressive language models based on each initial input token.
Using the change in next-word probability as a measure of importance, this work first examines which context words make the biggest contribution to language model predictions.
arXiv Detail & Related papers (2023-05-17T23:55:32Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
Artificial Adversaries? [61.58261351116679]
We introduce a two-stage adversarial example generation framework (NaturalAdversaries) for natural language understanding tasks.
It is adaptable to both black-box and white-box adversarial attacks based on the level of access to the model parameters.
Our results indicate these adversaries generalize across domains, and offer insights for future research on improving robustness of neural text classification models.
arXiv Detail & Related papers (2022-11-08T16:37:34Z) - Pretrained Transformers as Universal Computation Engines [105.00539596788127]
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning.
We study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction.
We find that such pretraining enables FPT to generalize in zero-shot to these modalities, matching the performance of a transformer fully trained on these tasks.
arXiv Detail & Related papers (2021-03-09T06:39:56Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.