Related papers: Dissociating language and thought in large language models

Dissociating language and thought in large language models

URL: http://arxiv.org/abs/2301.06627v3
Date: Sat, 23 Mar 2024 19:52:33 GMT
Title: Dissociating language and thought in large language models
Authors: Kyle Mahowald, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, Evelina Fedorenko,
Abstract summary: Large Language Models (LLMs) have come closest among all models to date to mastering human language. We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty.
Score: 52.39241645471213
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence - knowledge of linguistic rules and patterns - and functional linguistic competence - understanding and using language in the world. We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. We posit that models that use language in human-like ways would need to master both of these competence types, which, in turn, could require the emergence of mechanisms specialized for formal linguistic competence, distinct from functional competence.

Related papers

Are formal and functional linguistic mechanisms dissociated in language models? [35.514624827207136]
Large language models (LLMs) excel at producing fluent, grammatical text, but struggle with functional linguistic tasks. Recent work suggests that to succeed on both formal and functional linguistic tasks, LLMs should use different mechanisms for each. We find that while there is indeed little overlap between circuits for formal and functional tasks, there is also little overlap between formal linguistic tasks.
arXiv Detail & Related papers (2025-03-14T11:11:03Z)
From Language to Cognition: How LLMs Outgrow the Human Language Network [14.617453958510305]
Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. We benchmark 34 training checkpoints spanning 300B tokens across 8 different model sizes to analyze how brain alignment relates to linguistic competence.
arXiv Detail & Related papers (2025-03-03T18:54:19Z)
MaestroMotif: Skill Design from Artificial Intelligence Feedback [67.17724089381056]
MaestroMotif is a method for AI-assisted skill design, which yields high-performing and adaptable agents. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents.
arXiv Detail & Related papers (2024-12-11T16:59:31Z)
Large Language Models as Neurolinguistic Subjects: Discrepancy in Performance and Competence for Form and Meaning [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning) We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers. We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; and (3) Instruction tuning won't change much competence but improve performance.
arXiv Detail & Related papers (2024-11-12T04:16:44Z)
The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units [16.317199232071232]
Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience.
arXiv Detail & Related papers (2024-11-04T17:09:10Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Language Models as Models of Language [0.0]
This chapter critically examines the potential contributions of modern language models to theoretical linguistics. I review a growing body of empirical evidence suggesting that language models can learn hierarchical syntactic structure and exhibit sensitivity to various linguistic phenomena. I conclude that closer collaboration between theoretical linguists and computational researchers could yield valuable insights.
arXiv Detail & Related papers (2024-08-13T18:26:04Z)
Language Guided Skill Discovery [56.84356022198222]
We introduce Language Guided Skill Discovery (LGSD) to maximize semantic diversity between skills. LGSD takes user prompts as input and outputs a set of semantically distinctive skills. We demonstrate that LGSD enables legged robots to visit different user-intended areas on a plane by simply changing the prompt.
arXiv Detail & Related papers (2024-06-07T04:25:38Z)
Comuniqa : Exploring Large Language Models for improving speaking skills [2.8227892155844088]
We investigate the potential of Large Language Models (LLMs) to improve English speaking skills. Recent advancements in Artificial Intelligence (AI) offer promising solutions to overcome limitations. We propose Comuniqa, a novel LLM-based system designed to enhance English speaking skills.
arXiv Detail & Related papers (2024-01-28T07:37:33Z)
How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases. Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z)
Unveiling A Core Linguistic Region in Large Language Models [49.860260050718516]
This paper conducts an analogical research using brain localization as a prototype. We have discovered a core region in large language models that corresponds to linguistic competence. We observe that an improvement in linguistic competence does not necessarily accompany an elevation in the model's knowledge level.
arXiv Detail & Related papers (2023-10-23T13:31:32Z)
Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse. It remains an open question to what extent modern language models can interpret nonliteral phrases. We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z)
Language Models as a Knowledge Source for Cognitive Agents [9.061356032792954]
Language models (LMs) are sentence-completion engines trained on massive corpora. This paper outlines the challenges and opportunities for using language models as a new knowledge source for cognitive systems. It also identifies possible ways to improve knowledge extraction from language models using the capabilities provided by cognitive systems.
arXiv Detail & Related papers (2021-09-17T01:12:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.