Reconstruction Probing
- URL: http://arxiv.org/abs/2212.10792v1
- Date: Wed, 21 Dec 2022 06:22:03 GMT
- Title: Reconstruction Probing
- Authors: Najoung Kim, Jatin Khilnani, Alex Warstadt, Abed Qaddoumi
- Abstract summary: We propose a new analysis method for contextualized representations based on reconstruction probabilities in masked language models.
We find that contextualization boostsability of tokens close to the token being reconstructed in terms of linear and syntactic distance.
We extend our analysis to finer decomposition of contextualized representations, and we find that these boosts are largely attributable to static and positional embeddings at the input layer.
- Score: 7.647452554776166
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose reconstruction probing, a new analysis method for contextualized
representations based on reconstruction probabilities in masked language models
(MLMs). This method relies on comparing the reconstruction probabilities of
tokens in a given sequence when conditioned on the representation of a single
token that has been fully contextualized and when conditioned on only the
decontextualized lexical prior of the model. This comparison can be understood
as quantifying the contribution of contextualization towards reconstruction --
the difference in the reconstruction probabilities can only be attributed to
the representational change of the single token induced by contextualization.
We apply this analysis to three MLMs and find that contextualization boosts
reconstructability of tokens that are close to the token being reconstructed in
terms of linear and syntactic distance. Furthermore, we extend our analysis to
finer-grained decomposition of contextualized representations, and we find that
these boosts are largely attributable to static and positional embeddings at
the input layer.
Related papers
- How much do contextualized representations encode long-range context? [10.188367784207049]
We analyze contextual representations in neural autoregressive language models, emphasizing long-range contexts that span several thousand tokens.
Our methodology employs a perturbation setup and the metric emphAnisotropy-Calibrated Cosine Similarity, to capture the degree of contextualization of long-range patterns from the perspective of representation geometry.
arXiv Detail & Related papers (2024-10-16T06:49:54Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Representing and Computing Uncertainty in Phonological Reconstruction [5.284425534494986]
Despite the inherently fuzzy nature of reconstructions in historical linguistics, most scholars do not represent their uncertainty when proposing proto-forms.
We present a new framework that allows for the representation of uncertainty in linguistic reconstruction and also includes a workflow for the computation of fuzzy reconstructions from linguistic data.
arXiv Detail & Related papers (2023-10-19T13:27:42Z) - From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication [19.336940758147442]
It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases.
We introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations.
We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting.
arXiv Detail & Related papers (2023-10-02T13:55:38Z) - A Mechanism for Sample-Efficient In-Context Learning for Sparse
Retrieval Tasks [29.764014766305174]
We show how a transformer model is able to perform ICL under reasonable assumptions on the pre-training process and the downstream tasks.
We establish that this entire procedure is implementable using the transformer mechanism.
arXiv Detail & Related papers (2023-05-26T15:49:43Z) - Bayesian Recurrent Units and the Forward-Backward Algorithm [91.39701446828144]
Using Bayes's theorem, we derive a unit-wise recurrence as well as a backward recursion similar to the forward-backward algorithm.
The resulting Bayesian recurrent units can be integrated as recurrent neural networks within deep learning frameworks.
Experiments on speech recognition indicate that adding the derived units at the end of state-of-the-art recurrent architectures can improve the performance at a very low cost in terms of trainable parameters.
arXiv Detail & Related papers (2022-07-21T14:00:52Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - Entangled Residual Mappings [59.02488598557491]
We introduce entangled residual mappings to generalize the structure of the residual connections.
An entangled residual mapping replaces the identity skip connections with specialized entangled mappings.
We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks.
arXiv Detail & Related papers (2022-06-02T19:36:03Z) - Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation [66.85202434812942]
We reformulate few-shot segmentation as a semantic reconstruction problem.
We convert base class features into a series of basis vectors which span a class-level semantic space for novel class reconstruction.
Our proposed approach, referred to as anti-aliasing semantic reconstruction (ASR), provides a systematic yet interpretable solution for few-shot learning problems.
arXiv Detail & Related papers (2021-06-01T02:17:36Z) - Towards a Theoretical Understanding of the Robustness of Variational
Autoencoders [82.68133908421792]
We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations.
We develop a novel criterion for robustness in probabilistic models: $r$-robustness.
We show that VAEs trained using disentangling methods score well under our robustness metrics.
arXiv Detail & Related papers (2020-07-14T21:22:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.