Word Interdependence Exposes How LSTMs Compose Representations
- URL: http://arxiv.org/abs/2004.13195v1
- Date: Mon, 27 Apr 2020 21:48:08 GMT
- Title: Word Interdependence Exposes How LSTMs Compose Representations
- Authors: Naomi Saphra and Adam Lopez
- Abstract summary: Recent work in NLP shows that LSTM language models capture compositional structure in language data.
We present a novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates.
- Score: 18.34617849764921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work in NLP shows that LSTM language models capture compositional
structure in language data. For a closer look at how these representations are
composed hierarchically, we present a novel measure of interdependence between
word meanings in an LSTM, based on their interactions at the internal gates. To
explore how compositional representations arise over training, we conduct
simple experiments on synthetic data, which illustrate our measure by showing
how high interdependence can hurt generalization. These synthetic experiments
also illustrate a specific hypothesis about how hierarchical structures are
discovered over the course of training: that parent constituents rely on
effective representations of their children, rather than on learning long-range
relations independently. We further support this measure with experiments on
English language data, where interdependence is higher for more closely
syntactically linked word pairs.
Related papers
- Knowing Your Nonlinearities: Shapley Interactions Reveal the Underlying Structure of Data [8.029715695737567]
We use Shapley Taylor interaction indices (STII) to analyze the impact of underlying data structure on model representations.
Considering linguistic structure in masked and auto-regressive language models (ML and ALMs), we find that STII increases within idiomatic expressions.
Our speech model findings reflect the phonetic principal that the openness of the oral cavity determines how much a phoneme varies based on its context.
arXiv Detail & Related papers (2024-03-19T19:13:22Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Probing for Constituency Structure in Neural Language Models [11.359403179089817]
We focus on constituent structure as represented in the Penn Treebank (PTB)
We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks.
We show that a complete constituency tree can be linearly separated from LM representations.
arXiv Detail & Related papers (2022-04-13T07:07:37Z) - Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars.
Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z) - Imposing Relation Structure in Language-Model Embeddings Using
Contrastive Learning [30.00047118880045]
We propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure.
The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task.
arXiv Detail & Related papers (2021-09-02T10:58:27Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - Introducing Syntactic Structures into Target Opinion Word Extraction
with Deep Learning [89.64620296557177]
We propose to incorporate the syntactic structures of the sentences into the deep learning models for targeted opinion word extraction.
We also introduce a novel regularization technique to improve the performance of the deep learning models.
The proposed model is extensively analyzed and achieves the state-of-the-art performance on four benchmark datasets.
arXiv Detail & Related papers (2020-10-26T07:13:17Z) - High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model.
It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs.
Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z) - LSTMs Compose (and Learn) Bottom-Up [18.34617849764921]
Recent work in NLP shows that LSTM language models capture hierarchical structure in language data.
In contrast to existing work, we consider the textitlearning process that leads to their compositional behavior.
We present a related measure of Decompositional Interdependence between word meanings in an LSTM, based on their gate interactions.
arXiv Detail & Related papers (2020-10-06T13:00:32Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z) - Attribution Analysis of Grammatical Dependencies in LSTMs [0.043512163406551986]
LSTM language models have been shown to capture syntax-sensitive grammatical dependencies with a high degree of accuracy.
We show that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns.
Our results suggest that LSTM language models are able to infer robust representations of syntactic dependencies.
arXiv Detail & Related papers (2020-04-30T19:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.