Related papers: LSTMs Compose (and Learn) Bottom-Up

LSTMs Compose (and Learn) Bottom-Up

URL: http://arxiv.org/abs/2010.04650v1
Date: Tue, 6 Oct 2020 13:00:32 GMT
Title: LSTMs Compose (and Learn) Bottom-Up
Authors: Naomi Saphra and Adam Lopez
Abstract summary: Recent work in NLP shows that LSTM language models capture hierarchical structure in language data. In contrast to existing work, we consider the textitlearning process that leads to their compositional behavior. We present a related measure of Decompositional Interdependence between word meanings in an LSTM, based on their gate interactions.
Score: 18.34617849764921
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work in NLP shows that LSTM language models capture hierarchical structure in language data. In contrast to existing work, we consider the \textit{learning} process that leads to their compositional behavior. For a closer look at how an LSTM's sequential representations are composed hierarchically, we present a related measure of Decompositional Interdependence (DI) between word meanings in an LSTM, based on their gate interactions. We connect this measure to syntax with experiments on English language data, where DI is higher on pairs of words with lower syntactic distance. To explore the inductive biases that cause these compositional representations to arise during training, we conduct simple experiments on synthetic data. These synthetic experiments support a specific hypothesis about how hierarchical structures are discovered over the course of training: that LSTM constituent representations are learned bottom-up, relying on effective representations of their shorter children, rather than learning the longer-range relations independently from children.

Related papers

Analysis of Argument Structure Constructions in a Deep Recurrent Language Model [0.0]
We explore the representation and processing of Argument Structure Constructions (ASCs) in a recurrent neural language model. Our results show that sentence representations form distinct clusters corresponding to the four ASCs across all hidden layers. This indicates that even a relatively simple, brain-constrained recurrent neural network can effectively differentiate between various construction types.
arXiv Detail & Related papers (2024-08-06T09:27:41Z)
Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task. We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus. Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z)
Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions. This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Probing for Constituency Structure in Neural Language Models [11.359403179089817]
We focus on constituent structure as represented in the Penn Treebank (PTB) We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks. We show that a complete constituency tree can be linearly separated from LM representations.
arXiv Detail & Related papers (2022-04-13T07:07:37Z)
Syntactic representation learning for neural network based TTS with syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information. Experimental results demonstrate the effectiveness of our proposed approach. For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z)
A journey in ESN and LSTM visualisations on a language task [77.34726150561087]
We trained ESNs and LSTMs on a Cross-Situationnal Learning (CSL) task. The results are of three kinds: performance comparison, internal dynamics analyses and visualization of latent space.
arXiv Detail & Related papers (2020-12-03T08:32:01Z)
Introducing Syntactic Structures into Target Opinion Word Extraction with Deep Learning [89.64620296557177]
We propose to incorporate the syntactic structures of the sentences into the deep learning models for targeted opinion word extraction. We also introduce a novel regularization technique to improve the performance of the deep learning models. The proposed model is extensively analyzed and achieves the state-of-the-art performance on four benchmark datasets.
arXiv Detail & Related papers (2020-10-26T07:13:17Z)
Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models [47.42249565529833]
Humans can learn structural properties about a word from minimal experience. We assess the ability of modern neural language models to reproduce this behavior in English.
arXiv Detail & Related papers (2020-10-12T14:12:37Z)
High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model. It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs. Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z)
Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models [22.826154706036995]
LSTM-based recurrent neural networks are the state-of-the-art for many natural language processing (NLP) tasks. Lacking this understanding, the generality of LSTM performance on this task and their suitability for related tasks remains uncertain. We introduce *influence paths*, a causal account of structural properties as carried by paths across gates and neurons of a recurrent neural network.
arXiv Detail & Related papers (2020-05-03T21:10:31Z)
Attribution Analysis of Grammatical Dependencies in LSTMs [0.043512163406551986]
LSTM language models have been shown to capture syntax-sensitive grammatical dependencies with a high degree of accuracy. We show that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns. Our results suggest that LSTM language models are able to infer robust representations of syntactic dependencies.
arXiv Detail & Related papers (2020-04-30T19:19:37Z)
Word Interdependence Exposes How LSTMs Compose Representations [18.34617849764921]
Recent work in NLP shows that LSTM language models capture compositional structure in language data. We present a novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates.
arXiv Detail & Related papers (2020-04-27T21:48:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.