LSTMs Compose (and Learn) Bottom-Up
- URL: http://arxiv.org/abs/2010.04650v1
- Date: Tue, 6 Oct 2020 13:00:32 GMT
- Title: LSTMs Compose (and Learn) Bottom-Up
- Authors: Naomi Saphra and Adam Lopez
- Abstract summary: Recent work in NLP shows that LSTM language models capture hierarchical structure in language data.
In contrast to existing work, we consider the textitlearning process that leads to their compositional behavior.
We present a related measure of Decompositional Interdependence between word meanings in an LSTM, based on their gate interactions.
- Score: 18.34617849764921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work in NLP shows that LSTM language models capture hierarchical
structure in language data. In contrast to existing work, we consider the
\textit{learning} process that leads to their compositional behavior. For a
closer look at how an LSTM's sequential representations are composed
hierarchically, we present a related measure of Decompositional Interdependence
(DI) between word meanings in an LSTM, based on their gate interactions. We
connect this measure to syntax with experiments on English language data, where
DI is higher on pairs of words with lower syntactic distance. To explore the
inductive biases that cause these compositional representations to arise during
training, we conduct simple experiments on synthetic data. These synthetic
experiments support a specific hypothesis about how hierarchical structures are
discovered over the course of training: that LSTM constituent representations
are learned bottom-up, relying on effective representations of their shorter
children, rather than learning the longer-range relations independently from
children.
Related papers
- Analysis of Argument Structure Constructions in a Deep Recurrent Language Model [0.0]
We explore the representation and processing of Argument Structure Constructions (ASCs) in a recurrent neural language model.
Our results show that sentence representations form distinct clusters corresponding to the four ASCs across all hidden layers.
This indicates that even a relatively simple, brain-constrained recurrent neural network can effectively differentiate between various construction types.
arXiv Detail & Related papers (2024-08-06T09:27:41Z) - Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task.
We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus.
Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Probing for Constituency Structure in Neural Language Models [11.359403179089817]
We focus on constituent structure as represented in the Penn Treebank (PTB)
We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks.
We show that a complete constituency tree can be linearly separated from LM representations.
arXiv Detail & Related papers (2022-04-13T07:07:37Z) - Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information.
Experimental results demonstrate the effectiveness of our proposed approach.
For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z) - Introducing Syntactic Structures into Target Opinion Word Extraction
with Deep Learning [89.64620296557177]
We propose to incorporate the syntactic structures of the sentences into the deep learning models for targeted opinion word extraction.
We also introduce a novel regularization technique to improve the performance of the deep learning models.
The proposed model is extensively analyzed and achieves the state-of-the-art performance on four benchmark datasets.
arXiv Detail & Related papers (2020-10-26T07:13:17Z) - Structural Supervision Improves Few-Shot Learning and Syntactic
Generalization in Neural Language Models [47.42249565529833]
Humans can learn structural properties about a word from minimal experience.
We assess the ability of modern neural language models to reproduce this behavior in English.
arXiv Detail & Related papers (2020-10-12T14:12:37Z) - Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM
Language Models [22.826154706036995]
LSTM-based recurrent neural networks are the state-of-the-art for many natural language processing (NLP) tasks.
Lacking this understanding, the generality of LSTM performance on this task and their suitability for related tasks remains uncertain.
We introduce *influence paths*, a causal account of structural properties as carried by paths across gates and neurons of a recurrent neural network.
arXiv Detail & Related papers (2020-05-03T21:10:31Z) - Attribution Analysis of Grammatical Dependencies in LSTMs [0.043512163406551986]
LSTM language models have been shown to capture syntax-sensitive grammatical dependencies with a high degree of accuracy.
We show that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns.
Our results suggest that LSTM language models are able to infer robust representations of syntactic dependencies.
arXiv Detail & Related papers (2020-04-30T19:19:37Z) - Word Interdependence Exposes How LSTMs Compose Representations [18.34617849764921]
Recent work in NLP shows that LSTM language models capture compositional structure in language data.
We present a novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates.
arXiv Detail & Related papers (2020-04-27T21:48:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.