Probing for Constituency Structure in Neural Language Models
- URL: http://arxiv.org/abs/2204.06201v1
- Date: Wed, 13 Apr 2022 07:07:37 GMT
- Title: Probing for Constituency Structure in Neural Language Models
- Authors: David Arps, Younes Samih, Laura Kallmeyer, Hassan Sajjad
- Abstract summary: We focus on constituent structure as represented in the Penn Treebank (PTB)
We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks.
We show that a complete constituency tree can be linearly separated from LM representations.
- Score: 11.359403179089817
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we investigate to which extent contextual neural language
models (LMs) implicitly learn syntactic structure. More concretely, we focus on
constituent structure as represented in the Penn Treebank (PTB). Using standard
probing techniques based on diagnostic classifiers, we assess the accuracy of
representing constituents of different categories within the neuron activations
of a LM such as RoBERTa. In order to make sure that our probe focuses on
syntactic knowledge and not on implicit semantic generalizations, we also
experiment on a PTB version that is obtained by randomly replacing constituents
with each other while keeping syntactic structure, i.e., a semantically
ill-formed but syntactically well-formed version of the PTB. We find that 4
pretrained transfomer LMs obtain high performance on our probing tasks even on
manipulated data, suggesting that semantic and syntactic knowledge in their
representations can be separated and that constituency information is in fact
learned by the LM. Moreover, we show that a complete constituency tree can be
linearly separated from LM representations.
Related papers
- Analysis of Argument Structure Constructions in a Deep Recurrent Language Model [0.0]
We explore the representation and processing of Argument Structure Constructions (ASCs) in a recurrent neural language model.
Our results show that sentence representations form distinct clusters corresponding to the four ASCs across all hidden layers.
This indicates that even a relatively simple, brain-constrained recurrent neural network can effectively differentiate between various construction types.
arXiv Detail & Related papers (2024-08-06T09:27:41Z) - Linear Spaces of Meanings: Compositional Structures in Vision-Language
Models [110.00434385712786]
We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs)
We first present a framework for understanding compositional structures from a geometric perspective.
We then explain what these structures entail probabilistically in the case of VLM embeddings, providing intuitions for why they arise in practice.
arXiv Detail & Related papers (2023-02-28T08:11:56Z) - Synonym Detection Using Syntactic Dependency And Neural Embeddings [3.0770051635103974]
We study the role of syntactic dependencies in deriving distributional semantics using the Vector Space Model.
We study the effectiveness of injecting human-compiled semantic knowledge into neural embeddings on computing distributional similarity.
Our results show that the syntactically conditioned contexts can interpret lexical semantics better than the unconditioned ones.
arXiv Detail & Related papers (2022-09-30T03:16:41Z) - Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on
a Syntactic Task [70.29624135819884]
We study the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates.
Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor is present.
arXiv Detail & Related papers (2022-04-14T11:33:15Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Low-Dimensional Structure in the Space of Language Representations is
Reflected in Brain Responses [62.197912623223964]
We show a low-dimensional structure where language models and translation models smoothly interpolate between word embeddings, syntactic and semantic tasks, and future word embeddings.
We find that this representation embedding can predict how well each individual feature space maps to human brain responses to natural language stimuli recorded using fMRI.
This suggests that the embedding captures some part of the brain's natural language representation structure.
arXiv Detail & Related papers (2021-06-09T22:59:12Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z) - LSTMs Compose (and Learn) Bottom-Up [18.34617849764921]
Recent work in NLP shows that LSTM language models capture hierarchical structure in language data.
In contrast to existing work, we consider the textitlearning process that leads to their compositional behavior.
We present a related measure of Decompositional Interdependence between word meanings in an LSTM, based on their gate interactions.
arXiv Detail & Related papers (2020-10-06T13:00:32Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z) - Learning Music Helps You Read: Using Transfer to Study Linguistic
Structure in Language Models [27.91397366776451]
Training LSTMs on latent structure (MIDI music or Java code) improves test performance on natural language.
Experiments on transfer between natural languages controlling for vocabulary overlap show that zero-shot performance on a test language is highly correlated with typological similarity to the training language.
arXiv Detail & Related papers (2020-04-30T06:24:03Z) - Word Interdependence Exposes How LSTMs Compose Representations [18.34617849764921]
Recent work in NLP shows that LSTM language models capture compositional structure in language data.
We present a novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates.
arXiv Detail & Related papers (2020-04-27T21:48:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.