Related papers: Probing Syntax in Large Language Models: Successes and Remaining Challenges

Probing Syntax in Large Language Models: Successes and Remaining Challenges

URL: http://arxiv.org/abs/2508.03211v1
Date: Tue, 05 Aug 2025 08:41:14 GMT
Title: Probing Syntax in Large Language Models: Successes and Remaining Challenges
Authors: Pablo J. Diego-Simón, Emmanuel Chemla, Jean-Rémi King, Yair Lakretz,
Abstract summary: It remains unclear whether structural and/or statistical factors systematically affect these syntactic representations.<n>We conduct an in-depth analysis of structural probes on three controlled benchmarks.
Score: 7.9494253785082405
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The syntactic structures of sentences can be readily read-out from the activations of large language models (LLMs). However, the ``structural probes'' that have been developed to reveal this phenomenon are typically evaluated on an indiscriminate set of sentences. Consequently, it remains unclear whether structural and/or statistical factors systematically affect these syntactic representations. To address this issue, we conduct an in-depth analysis of structural probes on three controlled benchmarks. Our results are three-fold. First, structural probes are biased by a superficial property: the closer two words are in a sentence, the more likely structural probes will consider them as syntactically linked. Second, structural probes are challenged by linguistic properties: they poorly represent deep syntactic structures, and get interfered by interacting nouns or ungrammatical verb forms. Third, structural probes do not appear to be affected by the predictability of individual words. Overall, this work sheds light on the current challenges faced by structural probes. Providing a benchmark made of controlled stimuli to better evaluate their performance.

Related papers

Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations [33.04242471060053]
Large Language Models (LLMs) exhibit a robust mastery of syntax when processing and generating text.<n>No comprehensive study has yet established whether a model's probing accuracy reliably predicts its downstream syntactic performance.
arXiv Detail & Related papers (2025-06-20T01:46:50Z)
Fundamental Principles of Linguistic Structure are Not Represented by o3 [3.335047764053173]
o3 model fails to generalize basic phrase structure rules.<n>It fails to correctly rate and explain acceptability dynamics.<n>It fails to distinguish between instructions to generate unacceptable semantic vs. unacceptable syntactic outputs.
arXiv Detail & Related papers (2025-02-15T23:53:31Z)
Linguistic Structure Induction from Language Models [1.8130068086063336]
This thesis focuses on producing constituency and dependency structures from Language Models (LMs) in an unsupervised setting. I present a detailed study on StructFormer (SF) which retrofits a transformer architecture with a encoder network to produce constituency and dependency structures. I present six experiments to analyze and address this field's challenges.
arXiv Detail & Related papers (2024-03-11T16:54:49Z)
How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored. Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges. We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z)
"You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure. We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure. Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z)
Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions. This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Towards Robust NLG Bias Evaluation with Syntactically-diverse Prompts [38.69716232707304]
We present a robust methodology for evaluating biases in natural language generation (NLG) systems. Previous works use fixed hand-crafted prefix templates with mentions of various demographic groups to prompt models to generate continuations for bias analysis. To study this problem, we paraphrase the prompts with different syntactic structures and use these to evaluate demographic bias in NLG systems.
arXiv Detail & Related papers (2022-12-03T22:11:17Z)
Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on a Syntactic Task [70.29624135819884]
We study the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates. Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor is present.
arXiv Detail & Related papers (2022-04-14T11:33:15Z)
Syntactic Perturbations Reveal Representational Correlates of Hierarchical Phrase Structure in Pretrained Language Models [22.43510769150502]
It is not entirely clear what aspects of sentence-level syntax are captured by vector-based language representations. We show that Transformers build sensitivity to larger parts of the sentence along their layers, and that hierarchical phrase structure plays a role in this process.
arXiv Detail & Related papers (2021-04-15T16:30:31Z)
A Tale of a Probe and a Parser [74.14046092181947]
Measuring what linguistic information is encoded in neural models of language has become popular in NLP. Researchers approach this enterprise by training "probes" - supervised models designed to extract linguistic structure from another model's output. One such probe is the structural probe, designed to quantify the extent to which syntactic information is encoded in contextualised word representations.
arXiv Detail & Related papers (2020-05-04T16:57:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.