Attribution Analysis of Grammatical Dependencies in LSTMs
- URL: http://arxiv.org/abs/2005.00062v1
- Date: Thu, 30 Apr 2020 19:19:37 GMT
- Title: Attribution Analysis of Grammatical Dependencies in LSTMs
- Authors: Yiding Hao
- Abstract summary: LSTM language models have been shown to capture syntax-sensitive grammatical dependencies with a high degree of accuracy.
We show that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns.
Our results suggest that LSTM language models are able to infer robust representations of syntactic dependencies.
- Score: 0.043512163406551986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LSTM language models have been shown to capture syntax-sensitive grammatical
dependencies such as subject-verb agreement with a high degree of accuracy
(Linzen et al., 2016, inter alia). However, questions remain regarding whether
they do so using spurious correlations, or whether they are truly able to match
verbs with their subjects. This paper argues for the latter hypothesis. Using
layer-wise relevance propagation (Bach et al., 2015), a technique that
quantifies the contributions of input features to model behavior, we show that
LSTM performance on number agreement is directly correlated with the model's
ability to distinguish subjects from other nouns. Our results suggest that LSTM
language models are able to infer robust representations of syntactic
dependencies.
Related papers
- Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback [50.84142264245052]
This work introduces the Align-SLM framework to enhance the semantic understanding of textless Spoken Language Models (SLMs)
Our approach generates multiple speech continuations from a given prompt and uses semantic metrics to create preference data for Direct Preference Optimization (DPO)
We evaluate the framework using ZeroSpeech 2021 benchmarks for lexical and syntactic modeling, the spoken version of the StoryCloze dataset for semantic coherence, and other speech generation metrics, including the GPT4-o score and human evaluation.
arXiv Detail & Related papers (2024-11-04T06:07:53Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Language model acceptability judgements are not always robust to context [30.868765627701457]
We investigate the stability of language models' performance on targeted syntactic evaluations.
We find that model judgements are generally robust when placed in randomly sampled linguistic contexts.
We show that these changes in model performance are not explainable by simple features matching the context and the test inputs.
arXiv Detail & Related papers (2022-12-18T00:11:06Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model.
It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs.
Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z) - LSTMs Compose (and Learn) Bottom-Up [18.34617849764921]
Recent work in NLP shows that LSTM language models capture hierarchical structure in language data.
In contrast to existing work, we consider the textitlearning process that leads to their compositional behavior.
We present a related measure of Decompositional Interdependence between word meanings in an LSTM, based on their gate interactions.
arXiv Detail & Related papers (2020-10-06T13:00:32Z) - Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM
Language Models [22.826154706036995]
LSTM-based recurrent neural networks are the state-of-the-art for many natural language processing (NLP) tasks.
Lacking this understanding, the generality of LSTM performance on this task and their suitability for related tasks remains uncertain.
We introduce *influence paths*, a causal account of structural properties as carried by paths across gates and neurons of a recurrent neural network.
arXiv Detail & Related papers (2020-05-03T21:10:31Z) - Recurrent Neural Network Language Models Always Learn English-Like
Relative Clause Attachment [17.995905582226463]
We compare model performance in English and Spanish to show that non-linguistic biases in RNN LMs advantageously overlap with syntactic structure in English but not Spanish.
English models may appear to acquire human-like syntactic preferences, while models trained on Spanish fail to acquire comparable human-like preferences.
arXiv Detail & Related papers (2020-05-01T01:21:47Z) - An enhanced Tree-LSTM architecture for sentence semantic modeling using
typed dependencies [0.0]
Tree-based Long short term memory (LSTM) network has become state-of-the-art for modeling the meaning of language texts.
This paper proposes an enhanced LSTM architecture, called relation gated LSTM, which can model the relationship between two inputs of a sequence.
We also introduce a Tree-LSTM model called Typed Dependency Tree-LSTM that uses the sentence dependency parse structure and the dependency type to embed sentence meaning into a dense vector.
arXiv Detail & Related papers (2020-02-18T18:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.