A Non-Linear Structural Probe
- URL: http://arxiv.org/abs/2105.10185v1
- Date: Fri, 21 May 2021 07:53:10 GMT
- Title: A Non-Linear Structural Probe
- Authors: Jennifer C. White, Tiago Pimentel, Naomi Saphra, Ryan Cotterell
- Abstract summary: We study the case of a structural probe, which aims to investigate the encoding of syntactic structure in contextual representations.
By observing that the structural probe learns a metric, we are able to kernelize it and develop a novel non-linear variant.
We test on 6 languages and find that the radial-basis function (RBF) kernel, in conjunction with regularization, achieves a statistically significant improvement.
- Score: 43.50268085775569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Probes are models devised to investigate the encoding of knowledge -- e.g.
syntactic structure -- in contextual representations. Probes are often designed
for simplicity, which has led to restrictions on probe design that may not
allow for the full exploitation of the structure of encoded information; one
such restriction is linearity. We examine the case of a structural probe
(Hewitt and Manning, 2019), which aims to investigate the encoding of syntactic
structure in contextual representations through learning only linear
transformations. By observing that the structural probe learns a metric, we are
able to kernelize it and develop a novel non-linear variant with an identical
number of parameters. We test on 6 languages and find that the radial-basis
function (RBF) kernel, in conjunction with regularization, achieves a
statistically significant improvement over the baseline in all languages --
implying that at least part of the syntactic knowledge is encoded non-linearly.
We conclude by discussing how the RBF kernel resembles BERT's self-attention
layers and speculate that this resemblance leads to the RBF-based probe's
stronger performance.
Related papers
- Fast and Reliable Probabilistic Reflectometry Inversion with Prior-Amortized Neural Posterior Estimation [73.81105275628751]
Finding all structures compatible with reflectometry data is computationally prohibitive for standard algorithms.
We address this lack of reliability with a probabilistic deep learning method that identifies all realistic structures in seconds.
Our method, Prior-Amortized Neural Posterior Estimation (PANPE), combines simulation-based inference with novel adaptive priors.
arXiv Detail & Related papers (2024-07-26T10:29:16Z) - On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL [8.57550491437633]
This work investigates the linear handling of structured data in encoder-decoder language models, specifically T5.
Our findings reveal the model's ability to mimic human-designed processes such as schema linking and syntax prediction.
We also uncover insights into the model's internal mechanisms, including the ego-centric nature of structure node encodings.
arXiv Detail & Related papers (2024-04-03T01:16:20Z) - Hitting "Probe"rty with Non-Linearity, and More [2.1756081703276]
We reformulate the design of non-linear structural probes making their design simpler yet effective.
We qualitatively assess how strongly two words in a sentence are connected in the predicted dependency tree.
We find that the radial basis function (RBF) is an effective non-linear probe for the BERT model.
arXiv Detail & Related papers (2024-02-25T18:33:25Z) - Sequential Visual and Semantic Consistency for Semi-supervised Text
Recognition [56.968108142307976]
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models.
This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
arXiv Detail & Related papers (2024-02-24T13:00:54Z) - Probing for Constituency Structure in Neural Language Models [11.359403179089817]
We focus on constituent structure as represented in the Penn Treebank (PTB)
We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks.
We show that a complete constituency tree can be linearly separated from LM representations.
arXiv Detail & Related papers (2022-04-13T07:07:37Z) - Syntactic Perturbations Reveal Representational Correlates of
Hierarchical Phrase Structure in Pretrained Language Models [22.43510769150502]
It is not entirely clear what aspects of sentence-level syntax are captured by vector-based language representations.
We show that Transformers build sensitivity to larger parts of the sentence along their layers, and that hierarchical phrase structure plays a role in this process.
arXiv Detail & Related papers (2021-04-15T16:30:31Z) - Introducing Orthogonal Constraint in Structural Probes [0.2538209532048867]
We decompose a linear projection of language vector space into isomorphic space rotation and linear scaling directions.
We experimentally show that our approach can be performed in a multitask setting.
arXiv Detail & Related papers (2020-12-30T17:14:25Z) - Latent Template Induction with Gumbel-CRFs [107.17408593510372]
We explore the use of structured variational autoencoders to infer latent templates for sentence generation.
As a structured inference network, we show that it learns interpretable templates during training.
arXiv Detail & Related papers (2020-11-29T01:00:57Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - A Tale of a Probe and a Parser [74.14046092181947]
Measuring what linguistic information is encoded in neural models of language has become popular in NLP.
Researchers approach this enterprise by training "probes" - supervised models designed to extract linguistic structure from another model's output.
One such probe is the structural probe, designed to quantify the extent to which syntactic information is encoded in contextualised word representations.
arXiv Detail & Related papers (2020-05-04T16:57:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.