Probing with Noise: Unpicking the Warp and Weft of Embeddings
- URL: http://arxiv.org/abs/2210.12206v1
- Date: Fri, 21 Oct 2022 19:33:33 GMT
- Title: Probing with Noise: Unpicking the Warp and Weft of Embeddings
- Authors: Filip Klubi\v{c}ka, John D. Kelleher
- Abstract summary: We argue that it is possible for the vector norm to also carry linguistic information.
We develop a method to test this: an extension of the probing framework.
We find evidence that confirms the existence of separate information containers in English GloVe and BERT embeddings.
- Score: 2.9874726192215157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving our understanding of how information is encoded in vector space can
yield valuable interpretability insights. Alongside vector dimensions, we argue
that it is possible for the vector norm to also carry linguistic information.
We develop a method to test this: an extension of the probing framework which
allows for relative intrinsic interpretations of probing results. It relies on
introducing noise that ablates information encoded in embeddings, grounded in
random baselines and confidence intervals. We apply the method to
well-established probing tasks and find evidence that confirms the existence of
separate information containers in English GloVe and BERT embeddings. Our
correlation analysis aligns with the experimental findings that different
encoders use the norm to encode different kinds of information: GloVe stores
syntactic and sentence length information in the vector norm, while BERT uses
it to encode contextual incongruity.
Related papers
- Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification [1.6021932740447968]
Analyses of transformer-based models have shown that they encode a variety of linguistic information from their textual input.
We test to what degree information about chunks (in particular noun, verb or prepositional phrases) can be localized in sentence embeddings.
Our results show that such information is not distributed over the entire sentence embedding, but rather it is encoded in specific regions.
arXiv Detail & Related papers (2024-07-25T15:27:08Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge
Selection [71.20871905457174]
Language models (LMs) have revolutionized the way we interact with information, but they often generate nonfactual text.
Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up of irrelevant references.
We present DKGen, which divide the text generation process into an iterative process.
arXiv Detail & Related papers (2023-08-30T02:22:40Z) - Idioms, Probing and Dangerous Things: Towards Structural Probing for
Idiomaticity in Vector Space [2.5288257442251107]
The goal of this paper is to learn more about how idiomatic information is structurally encoded in embeddings.
We perform a comparative probing study of static (GloVe) and contextual (BERT) embeddings.
Our experiments indicate that both encode some idiomatic information to varying degrees, but yield conflicting evidence as to whether idiomaticity is encoded in the vector norm.
arXiv Detail & Related papers (2023-04-27T17:06:20Z) - Norm of Word Embedding Encodes Information Gain [7.934452214142754]
We show that the squared norm of static word embedding encodes the information gain conveyed by the word.
We also demonstrate that both the KL divergence and the squared norm of embedding provide a useful metric of the informativeness of a word.
arXiv Detail & Related papers (2022-12-19T17:45:07Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - The Interpretable Dictionary in Sparse Coding [4.205692673448206]
In our work, we illustrate that an ANN, trained using sparse coding under specific sparsity constraints, yields a more interpretable model than the standard deep learning model.
The dictionary learned by sparse coding can be more easily understood and the activations of these elements creates a selective feature output.
arXiv Detail & Related papers (2020-11-24T00:26:40Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z) - Spying on your neighbors: Fine-grained probing of contextual embeddings
for information about surrounding words [12.394077144994617]
We introduce a suite of probing tasks that enable fine-grained testing of contextual embeddings for encoding of information about surrounding words.
We examine the popular BERT, ELMo and GPT contextual encoders and find that each of our tested information types is indeed encoded as contextual information across tokens.
We discuss implications of these results for how different types of models breakdown and prioritize word-level context information when constructing token embeddings.
arXiv Detail & Related papers (2020-05-04T19:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.