Related papers: Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models

Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models

URL: http://arxiv.org/abs/2005.01190v1
Date: Sun, 3 May 2020 21:10:31 GMT
Title: Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models
Authors: Kaiji Lu, Piotr Mardziel, Klas Leino, Matt Fedrikson, Anupam Datta
Abstract summary: LSTM-based recurrent neural networks are the state-of-the-art for many natural language processing (NLP) tasks. Lacking this understanding, the generality of LSTM performance on this task and their suitability for related tasks remains uncertain. We introduce *influence paths*, a causal account of structural properties as carried by paths across gates and neurons of a recurrent neural network.
Score: 22.826154706036995
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LSTM-based recurrent neural networks are the state-of-the-art for many natural language processing (NLP) tasks. Despite their performance, it is unclear whether, or how, LSTMs learn structural features of natural languages such as subject-verb number agreement in English. Lacking this understanding, the generality of LSTM performance on this task and their suitability for related tasks remains uncertain. Further, errors cannot be properly attributed to a lack of structural capability, training data omissions, or other exceptional faults. We introduce *influence paths*, a causal account of structural properties as carried by paths across gates and neurons of a recurrent neural network. The approach refines the notion of influence (the subject's grammatical number has influence on the grammatical number of the subsequent verb) into a set of gate or neuron-level paths. The set localizes and segments the concept (e.g., subject-verb agreement), its constituent elements (e.g., the subject), and related or interfering elements (e.g., attractors). We exemplify the methodology on a widely-studied multi-layer LSTM language model, demonstrating its accounting for subject-verb number agreement. The results offer both a finer and a more complete view of an LSTM's handling of this structural aspect of the English language than prior results based on diagnostic classifiers and ablation.

Related papers

Do We Really Need GNNs with Explicit Structural Modeling? MLPs Suffice for Language Model Representations [50.45261187796993]
Graph Neural Networks (GNNs) fail to fully utilize structural information, whereas Multi-Layer Perceptrons (MLPs) exhibit a surprising ability in structure-aware tasks.<n>This paper introduces a comprehensive probing framework from an information-theoretic perspective.
arXiv Detail & Related papers (2025-06-26T18:10:28Z)
The Emergence of Abstract Thought in Large Language Models Beyond Any Language [95.50197866832772]
Large language models (LLMs) function effectively across a diverse range of languages.<n>Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts.<n>Recent results show strong multilingual performance, even surpassing English performance on specific tasks in other languages.
arXiv Detail & Related papers (2025-06-11T16:00:54Z)
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [49.09746599881631]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with multilingual-tuned models, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z)
Analysis of LLM as a grammatical feature tagger for African American English [0.6927055673104935]
African American English (AAE) presents unique challenges in natural language processing (NLP) This research systematically compares the performance of available NLP models. This study highlights the necessity for improved model training and architectural adjustments to better accommodate AAE's unique linguistic characteristics.
arXiv Detail & Related papers (2025-02-09T19:46:33Z)
Large Language Models are Interpretable Learners [53.56735770834617]
In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge the gap between expressiveness and interpretability. The pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts. As the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable) and other LLMs.
arXiv Detail & Related papers (2024-06-25T02:18:15Z)
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence. We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena. As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z)
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer [50.572974726351504]
We propose C-FNT, a novel E2E model that incorporates class-based LMs into FNT. In C-FNT, the LM score of named entities can be associated with the name class instead of its surface form. The experimental results show that our proposed C-FNT significantly reduces error in named entities without hurting performance in general word recognition.
arXiv Detail & Related papers (2023-09-14T12:14:49Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative [7.03497683558609]
Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. We present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC) Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning.
arXiv Detail & Related papers (2022-10-24T13:01:24Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
Demystifying Neural Language Models' Insensitivity to Word-Order [7.72780997900827]
We investigate the insensitivity of natural language models to word-order by quantifying perturbations. We find that neural language models require local ordering more so than the global ordering of tokens.
arXiv Detail & Related papers (2021-07-29T13:34:20Z)
SML: a new Semantic Embedding Alignment Transformer for efficient cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present. NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise. In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z)
LSTMs Compose (and Learn) Bottom-Up [18.34617849764921]
Recent work in NLP shows that LSTM language models capture hierarchical structure in language data. In contrast to existing work, we consider the textitlearning process that leads to their compositional behavior. We present a related measure of Decompositional Interdependence between word meanings in an LSTM, based on their gate interactions.
arXiv Detail & Related papers (2020-10-06T13:00:32Z)
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies? [9.248882589228089]
Long short-term memory (LSTM) networks are capable of encapsulating long-range dependencies. Simple recurrent networks (SRNs) have generally been less successful at capturing long-range dependencies. We propose a new architecture, the Decay RNN, which incorporates the decaying nature of neuronal activations.
arXiv Detail & Related papers (2020-05-17T09:13:28Z)
Attribution Analysis of Grammatical Dependencies in LSTMs [0.043512163406551986]
LSTM language models have been shown to capture syntax-sensitive grammatical dependencies with a high degree of accuracy. We show that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns. Our results suggest that LSTM language models are able to infer robust representations of syntactic dependencies.
arXiv Detail & Related papers (2020-04-30T19:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.