Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing
- URL: http://arxiv.org/abs/2106.02559v1
- Date: Fri, 4 Jun 2021 15:46:39 GMT
- Title: Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing
- Authors: Rowan Hall Maudslay, Ryan Cotterell
- Abstract summary: We show that semantic cues in training data means that syntactic probes do not properly isolate syntax.
We train the probes on several popular language models.
- Score: 45.834234634602566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Analysing whether neural language models encode linguistic information has
become popular in NLP. One method of doing so, which is frequently cited to
support the claim that models like BERT encode syntax, is called probing;
probes are small supervised models trained to extract linguistic information
from another model's output. If a probe is able to predict a particular
structure, it is argued that the model whose output it is trained on must have
implicitly learnt to encode it. However, drawing a generalisation about a
model's linguistic knowledge about a specific phenomena based on what a probe
is able to learn may be problematic: in this work, we show that semantic cues
in training data means that syntactic probes do not properly isolate syntax. We
generate a new corpus of semantically nonsensical but syntactically well-formed
Jabberwocky sentences, which we use to evaluate two probes trained on normal
data. We train the probes on several popular language models (BERT, GPT, and
RoBERTa), and find that in all settings they perform worse when evaluated on
these data, for one probe by an average of 15.4 UUAS points absolute. Although
in most cases they still outperform the baselines, their lead is reduced
substantially, e.g. by 53% in the case of BERT for one probe. This begs the
question: what empirical scores constitute knowing syntax?
Related papers
- Probing for targeted syntactic knowledge through grammatical error
detection [13.653209309144593]
We propose grammatical error detection as a diagnostic probe to evaluate pre-trained English language models.
We leverage public annotated training data from both English second language learners and Wikipedia edits.
We find that masked language models linearly encode information relevant to the detection of SVA errors, while the autoregressive models perform on par with our baseline.
arXiv Detail & Related papers (2022-10-28T16:01:25Z) - Order-sensitive Shapley Values for Evaluating Conceptual Soundness of
NLP Models [13.787554178089444]
Order-sensitive Shapley Values (OSV) is an explanation method for sequential data.
We show that OSV is more faithful in explaining model behavior than gradient-based methods.
We also show that OSV can be leveraged to generate adversarial examples.
arXiv Detail & Related papers (2022-06-01T02:30:12Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Deep Clustering of Text Representations for Supervision-free Probing of
Syntax [51.904014754864875]
We consider part of speech induction (POSI) and constituency labelling (CoLab) in this work.
We find that Multilingual BERT (mBERT) contains surprising amount of syntactic knowledge of English.
We report competitive performance of our probe on 45-tag English POSI, state-of-the-art performance on 12-tag POSI across 10 languages, and competitive results on CoLab.
arXiv Detail & Related papers (2020-10-24T05:06:29Z) - A Tale of a Probe and a Parser [74.14046092181947]
Measuring what linguistic information is encoded in neural models of language has become popular in NLP.
Researchers approach this enterprise by training "probes" - supervised models designed to extract linguistic structure from another model's output.
One such probe is the structural probe, designed to quantify the extent to which syntactic information is encoded in contextualised word representations.
arXiv Detail & Related papers (2020-05-04T16:57:31Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z) - Information-Theoretic Probing with Minimum Description Length [74.29846942213445]
We propose an alternative to the standard probes, information-theoretic probing with minimum description length (MDL)
With MDL probing, training a probe to predict labels is recast as teaching it to effectively transmit the data.
We show that these methods agree in results and are more informative and stable than the standard probes.
arXiv Detail & Related papers (2020-03-27T09:35:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.