Related papers: Spying on your neighbors: Fine-grained probing of contextual embeddings for information about surrounding words

Spying on your neighbors: Fine-grained probing of contextual embeddings for information about surrounding words

URL: http://arxiv.org/abs/2005.01810v1
Date: Mon, 4 May 2020 19:34:46 GMT
Title: Spying on your neighbors: Fine-grained probing of contextual embeddings for information about surrounding words
Authors: Josef Klafka and Allyson Ettinger
Abstract summary: We introduce a suite of probing tasks that enable fine-grained testing of contextual embeddings for encoding of information about surrounding words. We examine the popular BERT, ELMo and GPT contextual encoders and find that each of our tested information types is indeed encoded as contextual information across tokens. We discuss implications of these results for how different types of models breakdown and prioritize word-level context information when constructing token embeddings.
Score: 12.394077144994617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although models using contextual word embeddings have achieved state-of-the-art results on a host of NLP tasks, little is known about exactly what information these embeddings encode about the context words that they are understood to reflect. To address this question, we introduce a suite of probing tasks that enable fine-grained testing of contextual embeddings for encoding of information about surrounding words. We apply these tasks to examine the popular BERT, ELMo and GPT contextual encoders, and find that each of our tested information types is indeed encoded as contextual information across tokens, often with near-perfect recoverability-but the encoders vary in which features they distribute to which tokens, how nuanced their distributions are, and how robust the encoding of each feature is to distance. We discuss implications of these results for how different types of models breakdown and prioritize word-level context information when constructing token embeddings.

Related papers

Descriminative-Generative Custom Tokens for Vision-Language Models [101.40245125955306]
This paper explores the possibility of learning custom tokens for representing new concepts in Vision-Language Models (VLMs) Our aim is to learn tokens that can be effective for both discriminative and generative tasks while composing well with words to form new input queries.
arXiv Detail & Related papers (2025-02-17T18:13:42Z)
Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs. We propose a more realistic setting in which only noisy text and its NER labels are available. We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z)
Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models [8.588056811772693]
ConPARE-LAMA is a probe consisting of 34 million distinct prompts that facilitate comparison across minimal paraphrases. ConPARE-LAMA enables insights into the independent impact of either syntactical form or semantic information of paraphrases on the knowledge retrieval performance of PLMs.
arXiv Detail & Related papers (2024-04-02T14:35:08Z)
Text-To-KG Alignment: Comparing Current Methods on Classification Tasks [2.191505742658975]
knowledge graphs (KG) provide dense and structured representations of factual information. Recent work has focused on creating pipeline models that retrieve information from KGs as additional context. It is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query.
arXiv Detail & Related papers (2023-06-05T13:45:45Z)
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z)
Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z)
Span Classification with Structured Information for Disfluency Detection in Spoken Utterances [47.05113261111054]
We propose a novel architecture for detecting disfluencies in transcripts from spoken utterances. Our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection.
arXiv Detail & Related papers (2022-03-30T03:22:29Z)
KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt) We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z)
On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations [11.558645364193486]
In this paper, we analyze the evolution of the embedded syntax trees along the fine-tuning process of BERT for six different tasks. Experimental results show that the encoded information is forgotten (PoS tagging), reinforced (dependency and constituency parsing) or preserved (semantics-related tasks) in different ways along the fine-tuning process depending on the task.
arXiv Detail & Related papers (2021-01-27T15:41:09Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
A Survey on Contextual Embeddings [48.04732268018772]
Contextual embeddings assign each word a representation based on its context, capturing uses of words across varied contexts and encoding knowledge that transfers across languages. We review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
arXiv Detail & Related papers (2020-03-16T15:22:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.