Picking BERT's Brain: Probing for Linguistic Dependencies in
Contextualized Embeddings Using Representational Similarity Analysis
- URL: http://arxiv.org/abs/2011.12073v1
- Date: Tue, 24 Nov 2020 13:19:06 GMT
- Title: Picking BERT's Brain: Probing for Linguistic Dependencies in
Contextualized Embeddings Using Representational Similarity Analysis
- Authors: Michael A. Lepori, R. Thomas McCoy
- Abstract summary: We investigate the degree to which a verb embedding encodes the verb's subject, a pronoun embedding encodes the pronoun's antecedent, and a full-sentence representation encodes the sentence's head word.
In all cases, we show that BERT's contextualized embeddings reflect the linguistic dependency being studied, and that BERT encodes these dependencies to a greater degree than it encodes less linguistically-salient controls.
- Score: 13.016284599828232
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the name implies, contextualized representations of language are typically
motivated by their ability to encode context. Which aspects of context are
captured by such representations? We introduce an approach to address this
question using Representational Similarity Analysis (RSA). As case studies, we
investigate the degree to which a verb embedding encodes the verb's subject, a
pronoun embedding encodes the pronoun's antecedent, and a full-sentence
representation encodes the sentence's head word (as determined by a dependency
parse). In all cases, we show that BERT's contextualized embeddings reflect the
linguistic dependency being studied, and that BERT encodes these dependencies
to a greater degree than it encodes less linguistically-salient controls. These
results demonstrate the ability of our approach to adjudicate between
hypotheses about which aspects of context are encoded in representations of
language.
Related papers
- Semantics or spelling? Probing contextual word embeddings with orthographic noise [4.622165486890317]
It remains unclear exactly what information is encoded in PLM hidden states.
Surprisingly, we find that CWEs generated by popular PLMs are highly sensitive to noise in input data.
This suggests that CWEs capture information unrelated to word-level meaning and can be manipulated through trivial modifications of input data.
arXiv Detail & Related papers (2024-08-08T02:07:25Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - Transition-based Abstract Meaning Representation Parsing with Contextual
Embeddings [0.0]
We study a way of combing two of the most successful routes to meaning of language--statistical language models and symbolic semantics formalisms--in the task of semantic parsing.
We explore the utility of incorporating pretrained context-aware word embeddings--such as BERT and RoBERTa--in the problem of parsing.
arXiv Detail & Related papers (2022-06-13T15:05:24Z) - Latent Topology Induction for Understanding Contextualized
Representations [84.7918739062235]
We study the representation space of contextualized embeddings and gain insight into the hidden topology of large language models.
We show there exists a network of latent states that summarize linguistic properties of contextualized representations.
arXiv Detail & Related papers (2022-06-03T11:22:48Z) - Do Context-Aware Translation Models Pay the Right Attention? [61.25804242929533]
Context-aware machine translation models are designed to leverage contextual information, but often fail to do so.
In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words?
We introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations.
Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words.
arXiv Detail & Related papers (2021-05-14T17:32:24Z) - Counterfactual Interventions Reveal the Causal Effect of Relative Clause
Representations on Agreement Prediction [61.4913233397155]
We show that BERT uses information about RC spans during agreement prediction using the linguistically strategy.
We also found that counterfactual representations generated for a specific RC subtype influenced the number prediction in sentences with other RC subtypes, suggesting that information about RC boundaries was encoded abstractly in BERT's representation.
arXiv Detail & Related papers (2021-05-14T17:11:55Z) - Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT [7.057643880514415]
We investigate how Multilingual BERT (mBERT) encodes grammar by examining how the high-order grammatical feature of morphosyntactic alignment is manifested across the embedding spaces of different languages.
arXiv Detail & Related papers (2021-01-26T19:21:59Z) - Pareto Probing: Trading Off Accuracy for Complexity [87.09294772742737]
We argue for a probe metric that reflects the fundamental trade-off between probe complexity and performance.
Our experiments with dependency parsing reveal a wide gap in syntactic knowledge between contextual and non-contextual representations.
arXiv Detail & Related papers (2020-10-05T17:27:31Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.