When Does Syntax Mediate Neural Language Model Performance? Evidence
from Dropout Probes
- URL: http://arxiv.org/abs/2204.09722v1
- Date: Wed, 20 Apr 2022 18:09:36 GMT
- Title: When Does Syntax Mediate Neural Language Model Performance? Evidence
from Dropout Probes
- Authors: Mycal Tucker, Tiwalayo Eisape, Peng Qian, Roger Levy, and Julie Shah
- Abstract summary: We show that models encode syntactic information redundantly and introduce a new probe design that guides probes to consider all syntactic information present in embeddings.
We find evidence for the use of syntax in models where prior methods did not, allowing us to boost model performance by injecting syntactic information into representations.
- Score: 27.70448935595472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent causal probing literature reveals when language models and syntactic
probes use similar representations. Such techniques may yield "false negative"
causality results: models may use representations of syntax, but probes may
have learned to use redundant encodings of the same syntactic information. We
demonstrate that models do encode syntactic information redundantly and
introduce a new probe design that guides probes to consider all syntactic
information present in embeddings. Using these probes, we find evidence for the
use of syntax in models where prior methods did not, allowing us to boost model
performance by injecting syntactic information into representations.
Related papers
- Generating Enhanced Negatives for Training Language-Based Object Detectors [86.1914216335631]
We propose to leverage the vast knowledge built into modern generative models to automatically build negatives that are more relevant to the original data.
Specifically, we use large-language-models to generate negative text descriptions, and text-to-image diffusion models to also generate corresponding negative images.
Our experimental analysis confirms the relevance of the generated negative data, and its use in language-based detectors improves performance on two complex benchmarks.
arXiv Detail & Related papers (2023-12-29T23:04:00Z) - Generating Prototypes for Contradiction Detection Using Large Language
Models and Linguistic Rules [1.6497679785422956]
We introduce a novel data generation method for contradiction detection.
We instruct the generative models to create contradicting statements with respect to descriptions of specific contradiction types.
As an auxiliary approach, we use linguistic rules to construct simple contradictions.
arXiv Detail & Related papers (2023-10-23T09:07:27Z) - Probing for Incremental Parse States in Autoregressive Language Models [9.166953511173903]
Next-word predictions from autoregressive neural language models show remarkable sensitivity to syntax.
This work evaluates the extent to which this behavior arises as a result of a learned ability to maintain implicit representations of incremental syntactic structures.
arXiv Detail & Related papers (2022-11-17T18:15:31Z) - Probing via Prompting [71.7904179689271]
This paper introduces a novel model-free approach to probing, by formulating probing as a prompting task.
We conduct experiments on five probing tasks and show that our approach is comparable or better at extracting information than diagnostic probes.
We then examine the usefulness of a specific linguistic property for pre-training by removing the heads that are essential to that property and evaluating the resulting model's performance on language modeling.
arXiv Detail & Related papers (2022-07-04T22:14:40Z) - BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and
Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing.
We benchmark eight language models, including two GPT-3 variants available only through an API.
Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar.
We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods.
Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z) - What if This Modified That? Syntactic Interventions via Counterfactual
Embeddings [19.3614797257652]
Prior art aims to uncover meaningful properties within model representations via probes, but it is unclear how faithfully such probes portray information that the models actually use.
We propose a technique, inspired by causal analysis, for generating counterfactual embeddings within models.
In experiments testing our technique, we produce evidence that some BERT-based models use a tree-distance-like representation of syntax in downstream prediction tasks.
arXiv Detail & Related papers (2021-05-28T17:27:04Z) - Bird's Eye: Probing for Linguistic Graph Structures with a Simple
Information-Theoretic Approach [23.66191446048298]
We propose a new information-theoretic probe, Bird's Eye, for detecting if and how representations encode the information in linguistic graphs.
We also propose an approach to use our probe to investigate localized linguistic information in the linguistic graphs using perturbation analysis.
arXiv Detail & Related papers (2021-05-06T13:01:57Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.