Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM
Language Models
- URL: http://arxiv.org/abs/2005.01190v1
- Date: Sun, 3 May 2020 21:10:31 GMT
- Title: Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM
Language Models
- Authors: Kaiji Lu, Piotr Mardziel, Klas Leino, Matt Fedrikson, Anupam Datta
- Abstract summary: LSTM-based recurrent neural networks are the state-of-the-art for many natural language processing (NLP) tasks.
Lacking this understanding, the generality of LSTM performance on this task and their suitability for related tasks remains uncertain.
We introduce *influence paths*, a causal account of structural properties as carried by paths across gates and neurons of a recurrent neural network.
- Score: 22.826154706036995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LSTM-based recurrent neural networks are the state-of-the-art for many
natural language processing (NLP) tasks. Despite their performance, it is
unclear whether, or how, LSTMs learn structural features of natural languages
such as subject-verb number agreement in English. Lacking this understanding,
the generality of LSTM performance on this task and their suitability for
related tasks remains uncertain. Further, errors cannot be properly attributed
to a lack of structural capability, training data omissions, or other
exceptional faults. We introduce *influence paths*, a causal account of
structural properties as carried by paths across gates and neurons of a
recurrent neural network. The approach refines the notion of influence (the
subject's grammatical number has influence on the grammatical number of the
subsequent verb) into a set of gate or neuron-level paths. The set localizes
and segments the concept (e.g., subject-verb agreement), its constituent
elements (e.g., the subject), and related or interfering elements (e.g.,
attractors). We exemplify the methodology on a widely-studied multi-layer LSTM
language model, demonstrating its accounting for subject-verb number agreement.
The results offer both a finer and a more complete view of an LSTM's handling
of this structural aspect of the English language than prior results based on
diagnostic classifiers and ablation.
Related papers
- Large Language Models are Interpretable Learners [53.56735770834617]
In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge the gap between expressiveness and interpretability.
The pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts.
As the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable) and other LLMs.
arXiv Detail & Related papers (2024-06-25T02:18:15Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer [50.572974726351504]
We propose C-FNT, a novel E2E model that incorporates class-based LMs into FNT.
In C-FNT, the LM score of named entities can be associated with the name class instead of its surface form.
The experimental results show that our proposed C-FNT significantly reduces error in named entities without hurting performance in general word recognition.
arXiv Detail & Related papers (2023-09-14T12:14:49Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - The Better Your Syntax, the Better Your Semantics? Probing Pretrained
Language Models for the English Comparative Correlative [7.03497683558609]
Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics.
We present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC)
Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning.
arXiv Detail & Related papers (2022-10-24T13:01:24Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Demystifying Neural Language Models' Insensitivity to Word-Order [7.72780997900827]
We investigate the insensitivity of natural language models to word-order by quantifying perturbations.
We find that neural language models require local ordering more so than the global ordering of tokens.
arXiv Detail & Related papers (2021-07-29T13:34:20Z) - SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z) - LSTMs Compose (and Learn) Bottom-Up [18.34617849764921]
Recent work in NLP shows that LSTM language models capture hierarchical structure in language data.
In contrast to existing work, we consider the textitlearning process that leads to their compositional behavior.
We present a related measure of Decompositional Interdependence between word meanings in an LSTM, based on their gate interactions.
arXiv Detail & Related papers (2020-10-06T13:00:32Z) - How much complexity does an RNN architecture need to learn
syntax-sensitive dependencies? [9.248882589228089]
Long short-term memory (LSTM) networks are capable of encapsulating long-range dependencies.
Simple recurrent networks (SRNs) have generally been less successful at capturing long-range dependencies.
We propose a new architecture, the Decay RNN, which incorporates the decaying nature of neuronal activations.
arXiv Detail & Related papers (2020-05-17T09:13:28Z) - Attribution Analysis of Grammatical Dependencies in LSTMs [0.043512163406551986]
LSTM language models have been shown to capture syntax-sensitive grammatical dependencies with a high degree of accuracy.
We show that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns.
Our results suggest that LSTM language models are able to infer robust representations of syntactic dependencies.
arXiv Detail & Related papers (2020-04-30T19:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.