Syntactic Perturbations Reveal Representational Correlates of
Hierarchical Phrase Structure in Pretrained Language Models
- URL: http://arxiv.org/abs/2104.07578v1
- Date: Thu, 15 Apr 2021 16:30:31 GMT
- Title: Syntactic Perturbations Reveal Representational Correlates of
Hierarchical Phrase Structure in Pretrained Language Models
- Authors: Matteo Alleman, Jonathan Mamou, Miguel A Del Rio, Hanlin Tang, Yoon
Kim, SueYeon Chung
- Abstract summary: It is not entirely clear what aspects of sentence-level syntax are captured by vector-based language representations.
We show that Transformers build sensitivity to larger parts of the sentence along their layers, and that hierarchical phrase structure plays a role in this process.
- Score: 22.43510769150502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While vector-based language representations from pretrained language models
have set a new standard for many NLP tasks, there is not yet a complete
accounting of their inner workings. In particular, it is not entirely clear
what aspects of sentence-level syntax are captured by these representations,
nor how (if at all) they are built along the stacked layers of the network. In
this paper, we aim to address such questions with a general class of
interventional, input perturbation-based analyses of representations from
pretrained language models. Importing from computational and cognitive
neuroscience the notion of representational invariance, we perform a series of
probes designed to test the sensitivity of these representations to several
kinds of structure in sentences. Each probe involves swapping words in a
sentence and comparing the representations from perturbed sentences against the
original. We experiment with three different perturbations: (1) random
permutations of n-grams of varying width, to test the scale at which a
representation is sensitive to word position; (2) swapping of two spans which
do or do not form a syntactic phrase, to test sensitivity to global phrase
structure; and (3) swapping of two adjacent words which do or do not break
apart a syntactic phrase, to test sensitivity to local phrase structure.
Results from these probes collectively suggest that Transformers build
sensitivity to larger parts of the sentence along their layers, and that
hierarchical phrase structure plays a role in this process. More broadly, our
results also indicate that structured input perturbations widens the scope of
analyses that can be performed on often-opaque deep learning systems, and can
serve as a complement to existing tools (such as supervised linear probes) for
interpreting complex black-box models.
Related papers
- Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement [1.4335183427838039]
We take the approach of developing curated synthetic data on a large scale, with specific properties.
We use a new multiple-choice task and datasets, Blackbird Language Matrices, to focus on a specific grammatical structural phenomenon.
We show that despite having been trained on multilingual texts in a consistent manner, multilingual pretrained language models have language-specific differences.
arXiv Detail & Related papers (2024-09-10T14:58:55Z) - Linguistic Structure Induction from Language Models [1.8130068086063336]
This thesis focuses on producing constituency and dependency structures from Language Models (LMs) in an unsupervised setting.
I present a detailed study on StructFormer (SF) which retrofits a transformer architecture with a encoder network to produce constituency and dependency structures.
I present six experiments to analyze and address this field's challenges.
arXiv Detail & Related papers (2024-03-11T16:54:49Z) - Learning Disentangled Speech Representations [0.412484724941528]
SynSpeech is a novel large-scale synthetic speech dataset designed to enable research on disentangled speech representations.
We present a framework to evaluate disentangled representation learning techniques, applying both linear probing and established supervised disentanglement metrics.
We find that SynSpeech facilitates benchmarking across a range of factors, achieving promising disentanglement of simpler features like gender and speaking style, while highlighting challenges in isolating complex attributes like speaker identity.
arXiv Detail & Related papers (2023-11-04T04:54:17Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information.
Experimental results demonstrate the effectiveness of our proposed approach.
For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - A Self-supervised Representation Learning of Sentence Structure for
Authorship Attribution [3.5991811164452923]
We propose a self-supervised framework for learning structural representations of sentences.
We evaluate the learned structural representations of sentences using different probing tasks, and subsequently utilize them in the authorship attribution task.
arXiv Detail & Related papers (2020-10-14T02:57:10Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.