Are formal and functional linguistic mechanisms dissociated in language models?
- URL: http://arxiv.org/abs/2503.11302v3
- Date: Fri, 21 Mar 2025 13:15:27 GMT
- Title: Are formal and functional linguistic mechanisms dissociated in language models?
- Authors: Michael Hanna, Yonatan Belinkov, Sandro Pezzelle,
- Abstract summary: Large language models (LLMs) excel at producing fluent, grammatical text, but struggle with functional linguistic tasks.<n>Recent work suggests that to succeed on both formal and functional linguistic tasks, LLMs should use different mechanisms for each.<n>We find that while there is indeed little overlap between circuits for formal and functional tasks, there is also little overlap between formal linguistic tasks.
- Score: 35.514624827207136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although large language models (LLMs) are increasingly capable, these capabilities are unevenly distributed: they excel at formal linguistic tasks, such as producing fluent, grammatical text, but struggle more with functional linguistic tasks like reasoning and consistent fact retrieval. Inspired by neuroscience, recent work suggests that to succeed on both formal and functional linguistic tasks, LLMs should use different mechanisms for each; such localization could either be built-in or emerge spontaneously through training. In this paper, we ask: do current models, with fast-improving functional linguistic abilities, exhibit distinct localization of formal and functional linguistic mechanisms? We answer this by finding and comparing the "circuits", or minimal computational subgraphs, responsible for various formal and functional tasks. Comparing 5 LLMs across 10 distinct tasks, we find that while there is indeed little overlap between circuits for formal and functional tasks, there is also little overlap between formal linguistic tasks, as exists in the human brain. Thus, a single formal linguistic network, unified and distinct from functional task circuits, remains elusive. However, in terms of cross-task faithfulness - the ability of one circuit to solve another's task - we observe a separation between formal and functional mechanisms, suggesting that shared mechanisms between formal tasks may exist.
Related papers
- The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models [40.128112851978116]
We study how different prompting methods affect the geometry of representations in language models.<n>Our analysis highlights the critical role of input distribution samples and label semantics in few-shot in-context learning.<n>Our work contributes to the theoretical understanding of large language models and lays the groundwork for developing more effective, representation-aware prompting strategies.
arXiv Detail & Related papers (2025-02-11T23:09:50Z) - Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units [16.317199232071232]
Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature.
In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing.
We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience.
arXiv Detail & Related papers (2024-11-04T17:09:10Z) - How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases.
Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z) - Feature Interactions Reveal Linguistic Structure in Language Models [2.0178765779788495]
We study feature interactions in the context of feature attribution methods for post-hoc interpretability.
We work out a grey box methodology, in which we train models to perfection on a formal language classification task.
We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model.
arXiv Detail & Related papers (2023-06-21T11:24:41Z) - Dissociating language and thought in large language models [52.39241645471213]
Large Language Models (LLMs) have come closest among all models to date to mastering human language.
We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms.
Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty.
arXiv Detail & Related papers (2023-01-16T22:41:19Z) - Low-Dimensional Structure in the Space of Language Representations is
Reflected in Brain Responses [62.197912623223964]
We show a low-dimensional structure where language models and translation models smoothly interpolate between word embeddings, syntactic and semantic tasks, and future word embeddings.
We find that this representation embedding can predict how well each individual feature space maps to human brain responses to natural language stimuli recorded using fMRI.
This suggests that the embedding captures some part of the brain's natural language representation structure.
arXiv Detail & Related papers (2021-06-09T22:59:12Z) - A Matter of Framing: The Impact of Linguistic Formalism on Probing
Results [69.36678873492373]
Deep pre-trained contextualized encoders like BERT (Delvin et al.) demonstrate remarkable performance on a range of downstream tasks.
Recent research in probing investigates the linguistic knowledge implicitly learned by these models during pre-training.
Can the choice of formalism affect probing results?
We find linguistically meaningful differences in the encoding of semantic role- and proto-role information by BERT depending on the formalism.
arXiv Detail & Related papers (2020-04-30T17:45:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.