Information-Theoretic Probing for Linguistic Structure
- URL: http://arxiv.org/abs/2004.03061v2
- Date: Fri, 22 May 2020 21:58:58 GMT
- Title: Information-Theoretic Probing for Linguistic Structure
- Authors: Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina
Williams, Ryan Cotterell
- Abstract summary: We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
- Score: 74.04862204427944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of neural networks on a diverse set of NLP tasks has led
researchers to question how much these networks actually ``know'' about natural
language. Probes are a natural way of assessing this. When probing, a
researcher chooses a linguistic task and trains a supervised model to predict
annotations in that linguistic task from the network's learned representations.
If the probe does well, the researcher may conclude that the representations
encode knowledge related to the task. A commonly held belief is that using
simpler models as probes is better; the logic is that simpler models will
identify linguistic structure, but not learn the task itself. We propose an
information-theoretic operationalization of probing as estimating mutual
information that contradicts this received wisdom: one should always select the
highest performing probe one can, even if it is more complex, since it will
result in a tighter estimate, and thus reveal more of the linguistic
information inherent in the representation. The experimental portion of our
paper focuses on empirically estimating the mutual information between a
linguistic property and BERT, comparing these estimates to several baselines.
We evaluate on a set of ten typologically diverse languages often
underrepresented in NLP research---plus English---totalling eleven languages.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Probing via Prompting [71.7904179689271]
This paper introduces a novel model-free approach to probing, by formulating probing as a prompting task.
We conduct experiments on five probing tasks and show that our approach is comparable or better at extracting information than diagnostic probes.
We then examine the usefulness of a specific linguistic property for pre-training by removing the heads that are essential to that property and evaluating the resulting model's performance on language modeling.
arXiv Detail & Related papers (2022-07-04T22:14:40Z) - Is neural language acquisition similar to natural? A chronological
probing study [0.0515648410037406]
We present the chronological probing study of transformer English models such as MultiBERT and T5.
We compare the information about the language learned by the models in the process of training on corpora.
The results show that 1) linguistic information is acquired in the early stages of training 2) both language models demonstrate capabilities to capture various features from various levels of language.
arXiv Detail & Related papers (2022-07-01T17:24:11Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Probing Across Time: What Does RoBERTa Know and When? [70.20775905353794]
We show that linguistic knowledge is acquired fast, stably, and robustly across domains. Facts and commonsense are slower and more domain-sensitive.
We believe that probing-across-time analyses can help researchers understand the complex, intermingled learning that these models undergo and guide us toward more efficient approaches that accomplish necessary learning faster.
arXiv Detail & Related papers (2021-04-16T04:26:39Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - Probing the Probing Paradigm: Does Probing Accuracy Entail Task
Relevance? [27.64235687067883]
We show that models can learn to encode linguistic properties even if they are not needed for the task on which the model was trained.
We demonstrate models can encode these properties considerably above chance-level even when distributed in the data as random noise.
arXiv Detail & Related papers (2020-05-02T06:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.