Related papers: Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

URL: http://arxiv.org/abs/2501.08618v1
Date: Wed, 15 Jan 2025 06:34:34 GMT
Title: Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models
Authors: Aruna Sankaranarayanan, Dylan Hadfield-Menell, Aaron Mueller,
Abstract summary: We generate inputs using English, Italian, Japanese, or nonce words.<n>We observe that language models show distinct behaviors on hierarchical versus linearly structured inputs.
Score: 16.129038982673432
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise solely from exposure to large-scale language distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the underlying grammars to conform to either hierarchical or linear/positional rules. Using these grammars, we first observe that language models show distinct behaviors on hierarchical versus linearly structured inputs. Then, we find that the components responsible for processing hierarchical grammars are distinct from those that process linear grammars; we causally verify this in ablation experiments. Finally, we observe that hierarchy-selective components are also active on nonce grammars; this suggests that hierarchy sensitivity is not tied to meaning, nor in-distribution inputs.

Related papers

Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers. It is common to instead use proxy tasks that are similar in only an informal sense. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z)
Principles of semantic and functional efficiency in grammatical patterning [1.6267479602370545]
We show that grammatical organization provably inherits from perceptual attributes.<n>Our measurements on a diverse language sample show that grammars prioritize functional goals.
arXiv Detail & Related papers (2024-10-21T10:49:54Z)
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures. We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z)
Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks [6.390468088226495]
We propose a new method to extract and explore significant fine-grained grammar patterns from treebanks. We extract descriptions and rules across different languages for two linguistic phenomena, agreement and word order. Our method captures both well-known and less well-known significant grammar rules in Spanish, French, and Wolof.
arXiv Detail & Related papers (2024-03-26T09:39:53Z)
Decoding Probing: Revealing Internal Linguistic Structures in Neural Language Models using Minimal Pairs [0.873811641236639]
We introduce a novel decoding probing' method to probe internal linguistic characteristics in neural language models layer by layer. By treating the language model as the brain' and its representations as neural activations', we decode grammaticality labels of minimal pairs from the intermediate layers' representations.
arXiv Detail & Related papers (2024-03-26T00:56:06Z)
How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases [28.58785395946639]
We show that pre-training can teach language models to rely on hierarchical syntactic features when performing tasks after fine-tuning. We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus.
arXiv Detail & Related papers (2023-05-31T14:38:14Z)
Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge. We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences. We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z)
How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech [25.02822854434971]
We train LSTMs and Transformers on data similar in quantity and content to children's linguistic input: text from the CHILDES corpus. We find that both model types generalize in a way more consistent with an incorrect linear rule than the correct hierarchical rule. These results suggest that human-like generalization from text alone requires stronger biases than the general sequence-processing biases of standard neural network architectures.
arXiv Detail & Related papers (2023-01-26T23:24:17Z)
Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars. Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z)
Low-Dimensional Structure in the Space of Language Representations is Reflected in Brain Responses [62.197912623223964]
We show a low-dimensional structure where language models and translation models smoothly interpolate between word embeddings, syntactic and semantic tasks, and future word embeddings. We find that this representation embedding can predict how well each individual feature space maps to human brain responses to natural language stimuli recorded using fMRI. This suggests that the embedding captures some part of the brain's natural language representation structure.
arXiv Detail & Related papers (2021-06-09T22:59:12Z)
VLGrammar: Grounded Grammar Induction of Vision and Language [86.88273769411428]
We study grounded grammar induction of vision and language in a joint learning framework. We present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously.
arXiv Detail & Related papers (2021-03-24T04:05:08Z)
Seeing Both the Forest and the Trees: Multi-head Attention for Joint Classification on Different Compositional Levels [15.453888735879525]
In natural languages, words are used in association to construct sentences. We design a deep neural network architecture that explicitly wires lower and higher linguistic components. We show that our model, MHAL, learns to simultaneously solve them at different levels of granularity.
arXiv Detail & Related papers (2020-11-01T10:44:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.