Related papers: Derivational Probing: Unveiling the Layer-wise Derivation of Syntactic Structures in Neural Language Models

Derivational Probing: Unveiling the Layer-wise Derivation of Syntactic Structures in Neural Language Models

URL: http://arxiv.org/abs/2506.21861v1
Date: Fri, 27 Jun 2025 02:29:30 GMT
Title: Derivational Probing: Unveiling the Layer-wise Derivation of Syntactic Structures in Neural Language Models
Authors: Taiga Someya, Ryo Yoshida, Hitomi Yanaka, Yohei Oseki,
Abstract summary: We propose Derivational Probing to investigate how micro-syntactic structures and macro-syntactic structures are constructed.<n>Our experiments on BERT reveal a clear bottom-up derivation: micro-syntactic structures emerge in lower layers and are gradually integrated into a coherent macro-syntactic structure in higher layers.
Score: 16.97687131562374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has demonstrated that neural language models encode syntactic structures in their internal representations, yet the derivations by which these structures are constructed across layers remain poorly understood. In this paper, we propose Derivational Probing to investigate how micro-syntactic structures (e.g., subject noun phrases) and macro-syntactic structures (e.g., the relationship between the root verbs and their direct dependents) are constructed as word embeddings propagate upward across layers. Our experiments on BERT reveal a clear bottom-up derivation: micro-syntactic structures emerge in lower layers and are gradually integrated into a coherent macro-syntactic structure in higher layers. Furthermore, a targeted evaluation on subject-verb number agreement shows that the timing of constructing macro-syntactic structures is critical for downstream performance, suggesting an optimal timing for integrating global syntactic information.

Related papers

Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations [33.04242471060053]
Large Language Models (LLMs) exhibit a robust mastery of syntax when processing and generating text.<n>No comprehensive study has yet established whether a model's probing accuracy reliably predicts its downstream syntactic performance.
arXiv Detail & Related papers (2025-06-20T01:46:50Z)
Hierarchical Lexical Manifold Projection in Large Language Models: A Novel Mechanism for Multi-Scale Semantic Representation [0.0]
The integration of structured hierarchical embeddings into transformer-based architectures introduces a refined approach to lexical representation.<n>A projection mechanism that maps tokens onto a structured manifold provides improved lexical alignment.<n>The refined hierarchical organization of embeddings provides greater interpretability in lexical modeling.
arXiv Detail & Related papers (2025-02-08T00:49:32Z)
Compositional Structures in Neural Embedding and Interaction Decompositions [101.40245125955306]
We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks. We introduce a characterization of compositional structures in terms of "interaction decompositions" We establish necessary and sufficient conditions for the presence of such structures within the representations of a model.
arXiv Detail & Related papers (2024-07-12T02:39:50Z)
Linguistic Structure Induction from Language Models [1.8130068086063336]
This thesis focuses on producing constituency and dependency structures from Language Models (LMs) in an unsupervised setting. I present a detailed study on StructFormer (SF) which retrofits a transformer architecture with a encoder network to produce constituency and dependency structures. I present six experiments to analyze and address this field's challenges.
arXiv Detail & Related papers (2024-03-11T16:54:49Z)
Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network [29.149367323751413]
We propose ReStruct, a meta-structure search framework that integrates reasoning into the evolutionary procedure. We show that ReStruct achieves state-of-the-art performance in both recommendation and node classification tasks.
arXiv Detail & Related papers (2024-02-18T09:21:12Z)
Unsupervised Chunking with Hierarchical RNN [62.15060807493364]
This paper introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner. We present a two-layer Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions. Experiments on the CoNLL-2000 dataset reveal a notable improvement over existing unsupervised methods, enhancing phrase F1 score by up to 6 percentage points.
arXiv Detail & Related papers (2023-09-10T02:55:12Z)
Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings and reasoning mechanisms is a significant challenge.<n>We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.<n>We demonstrate that generative models like GPT can accurately learn and reason over CFG-defined hierarchies and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z)
Multi-Relational Hyperbolic Word Embeddings from Natural Language Definitions [5.763375492057694]
This paper presents a multi-relational model that explicitly leverages such a structure to derive word embeddings from definitions. An empirical analysis demonstrates that the framework can help imposing the desired structural constraints. Experiments reveal the superiority of the Hyperbolic word embeddings over the Euclidean counterparts.
arXiv Detail & Related papers (2023-05-12T08:16:06Z)
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure [5.2869308707704255]
StrAE is a Structured Autoencoder framework that through strict adherence to explicit structure, enables effective learning of multi-level representations. We show that our results are directly attributable to the informativeness of the structure provided as input, and show that this is not the case for existing tree models. We then extend StrAE to allow the model to define its own compositions using a simple localised-merge algorithm.
arXiv Detail & Related papers (2023-05-09T16:20:48Z)
Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs. Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z)
Model Criticism for Long-Form Text Generation [113.13900836015122]
We apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of generated text. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality. We find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
arXiv Detail & Related papers (2022-10-16T04:35:58Z)
Probing for Constituency Structure in Neural Language Models [11.359403179089817]
We focus on constituent structure as represented in the Penn Treebank (PTB) We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks. We show that a complete constituency tree can be linearly separated from LM representations.
arXiv Detail & Related papers (2022-04-13T07:07:37Z)
Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus. We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.