Probing the Probing Paradigm: Does Probing Accuracy Entail Task
Relevance?
- URL: http://arxiv.org/abs/2005.00719v3
- Date: Sun, 7 Mar 2021 18:38:24 GMT
- Title: Probing the Probing Paradigm: Does Probing Accuracy Entail Task
Relevance?
- Authors: Abhilasha Ravichander, Yonatan Belinkov, Eduard Hovy
- Abstract summary: We show that models can learn to encode linguistic properties even if they are not needed for the task on which the model was trained.
We demonstrate models can encode these properties considerably above chance-level even when distributed in the data as random noise.
- Score: 27.64235687067883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although neural models have achieved impressive results on several NLP
benchmarks, little is understood about the mechanisms they use to perform
language tasks. Thus, much recent attention has been devoted to analyzing the
sentence representations learned by neural encoders, through the lens of
`probing' tasks. However, to what extent was the information encoded in
sentence representations, as discovered through a probe, actually used by the
model to perform its task? In this work, we examine this probing paradigm
through a case study in Natural Language Inference, showing that models can
learn to encode linguistic properties even if they are not needed for the task
on which the model was trained. We further identify that pretrained word
embeddings play a considerable role in encoding these properties rather than
the training task itself, highlighting the importance of careful controls when
designing probing experiments. Finally, through a set of controlled synthetic
tasks, we demonstrate models can encode these properties considerably above
chance-level even when distributed in the data as random noise, calling into
question the interpretation of absolute claims on probing tasks.
Related papers
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - Topic Aware Probing: From Sentence Length Prediction to Idiom
Identification how reliant are Neural Language Models on Topic? [1.816169926868157]
We study the relationship between Transformer-based models' (BERT and RoBERTa's) performance on a range of probing tasks in English.
Our results indicate that Transformer-based models encode both topic and non-topic information in their intermediate layers.
Our analysis of these models' performance on other standard probing tasks suggests that tasks that are relatively insensitive to the topic information are also tasks that are relatively difficult for these models.
arXiv Detail & Related papers (2024-03-04T13:10:08Z) - An Exploration of Prompt Tuning on Generative Spoken Language Model for
Speech Processing Tasks [112.1942546460814]
We report the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM)
Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models.
arXiv Detail & Related papers (2022-03-31T03:26:55Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - AStitchInLanguageModels: Dataset and Methods for the Exploration of
Idiomaticity in Pre-Trained Language Models [7.386862225828819]
This work presents a novel dataset of naturally occurring sentences containing MWEs manually classified into a fine-grained set of meanings.
We use this dataset in two tasks designed to test i) a language model's ability to detect idiom usage, and ii) the effectiveness of a language model in generating representations of sentences containing idioms.
arXiv Detail & Related papers (2021-09-09T16:53:17Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - An Investigation of Language Model Interpretability via Sentence Editing [5.492504126672887]
We re-purpose a sentence editing dataset as a testbed for interpretability of pre-trained language models (PLMs)
This enables us to conduct a systematic investigation on an array of questions regarding PLMs' interpretability.
The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales.
arXiv Detail & Related papers (2020-11-28T00:46:43Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.