Related papers: Development of Cognitive Intelligence in Pre-trained Language Models

Development of Cognitive Intelligence in Pre-trained Language Models

URL: http://arxiv.org/abs/2407.01047v3
Date: Fri, 12 Jul 2024 05:47:31 GMT
Title: Development of Cognitive Intelligence in Pre-trained Language Models
Authors: Raj Sanjay Shah, Khushi Bhardwaj, Sashank Varma,
Abstract summary: Recent studies show evidence for emergent cognitive abilities in Large Pre-trained Language Models. The developmental trajectories of PLMs consistently exhibit a window of maximal alignment to human cognitive development. After that window, training appears to serve the engineering goal of reducing loss but not the scientific goal of increasing alignment with human cognition.
Score: 3.1815791977708834
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent studies show evidence for emergent cognitive abilities in Large Pre-trained Language Models (PLMs). The increasing cognitive alignment of these models has made them candidates for cognitive science theories. Prior research into the emergent cognitive abilities of PLMs has largely been path-independent to model training, i.e., has focused on the final model weights and not the intermediate steps. However, building plausible models of human cognition using PLMs would benefit from considering the developmental alignment of their performance during training to the trajectories of children's thinking. Guided by psychometric tests of human intelligence, we choose four sets of tasks to investigate the alignment of ten popular families of PLMs and evaluate their available intermediate and final training steps. These tasks are Numerical ability, Linguistic abilities, Conceptual understanding, and Fluid reasoning. We find a striking regularity: regardless of model size, the developmental trajectories of PLMs consistently exhibit a window of maximal alignment to human cognitive development. Before that window, training appears to endow "blank slate" models with the requisite structure to be poised to rapidly learn from experience. After that window, training appears to serve the engineering goal of reducing loss but not the scientific goal of increasing alignment with human cognition.

Related papers

Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions [11.40240971657506]
In this work, we explore the potential of pretrained large language models to serve as dual-purpose cognitive models.<n>We employ reinforcement learning with outcome-based rewards to guide LLMs toward generating explicit reasoning traces for explaining human risky choices.
arXiv Detail & Related papers (2025-05-16T18:22:05Z)
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning [58.86928947970342]
Embodied-R is a framework combining large-scale Vision-Language Models for perception and small-scale Language Models for reasoning. After training on only 5k embodied video samples, Embodied-R with a 3B LM matches state-of-the-art multimodal reasoning models. Embodied-R also exhibits emergent thinking patterns such as systematic analysis and contextual integration.
arXiv Detail & Related papers (2025-04-17T06:16:11Z)
The potential -- and the pitfalls -- of using pre-trained language models as cognitive science theories [2.6549754445378344]
We discuss challenges to the use of PLMs as cognitive science theories. We review assumptions used by researchers to map measures of PLM performance to measures of human performance. We end by enumerating criteria for using PLMs as credible accounts of cognition and cognitive development.
arXiv Detail & Related papers (2025-01-22T05:24:23Z)
Can Language Models Learn to Skip Steps? [59.84848399905409]
We study the ability to skip steps in reasoning. Unlike humans, who may skip steps to enhance efficiency or to reduce cognitive load, models do not possess such motivations. Our work presents the first exploration into human-like step-skipping ability.
arXiv Detail & Related papers (2024-11-04T07:10:24Z)
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data [28.900987544062257]
We introduce BIG5-CHAT, a large-scale dataset containing 100,000 dialogues designed to ground models in how humans express their personality in text. Our methods prompting outperform on personality assessments such as BFI and IPIP-NEO, with trait correlations more closely matching human data. Our experiments reveal that models trained to exhibit higher conscientiousness, higher agreeableness, lower extraversion, and lower neuroticism display better performance on reasoning tasks.
arXiv Detail & Related papers (2024-10-21T20:32:27Z)
Judgment of Learning: A Human Ability Beyond Generative Artificial Intelligence [0.0]
Large language models (LLMs) increasingly mimic human cognition in various language-based tasks. We introduce a cross-agent prediction model to assess whether ChatGPT-based LLMs align with human judgments of learning (JOL) Our results revealed that while human JOL reliably predicted actual memory performance, none of the tested LLMs demonstrated comparable predictive accuracy.
arXiv Detail & Related papers (2024-10-17T09:42:30Z)
CogniDual Framework: Self-Training Large Language Models within a Dual-System Theoretical Framework for Improving Cognitive Tasks [39.43278448546028]
Kahneman's dual-system theory elucidates the human decision-making process, distinguishing between the rapid, intuitive System 1 and the deliberative, rational System 2. Recent advancements have positioned large language Models (LLMs) as formidable tools nearing human-level proficiency in various cognitive tasks. This study introduces the textbfCogniDual Framework for LLMs (CFLLMs), designed to assess whether LLMs can, through self-training, evolve from deliberate deduction to intuitive responses.
arXiv Detail & Related papers (2024-09-05T09:33:24Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development. We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
On the Unexpected Abilities of Large Language Models [0.0]
Large Language Models (LLMs) are capable of displaying a wide range of abilities that are not directly connected with the task for which they are trained. I discuss the nature of the indirect process that leads to the acquisition of these cognitive abilities, their relation to other indirect processes, and the implications for the acquisition of integrated abilities.
arXiv Detail & Related papers (2023-08-09T09:15:07Z)
Artificial Neuropsychology: Are Large Language Models Developing Executive Functions? [0.0]
We evaluate the planning function and working memory of GPT using the popular Towers of Hanoi method. Preliminary results show that LLMs generates near-optimal solutions in Towers of Hanoi related tasks. These abilities are quite limited and worse than well-trained humans when the tasks are not known.
arXiv Detail & Related papers (2023-05-06T20:53:22Z)
Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology. We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table. It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z)
Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories. This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network. We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z)
AGENT: A Benchmark for Core Psychological Reasoning [60.35621718321559]
Intuitive psychology is the ability to reason about hidden mental variables that drive observable actions. Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning. We present a benchmark consisting of procedurally generated 3D animations, AGENT, structured around four scenarios.
arXiv Detail & Related papers (2021-02-24T14:58:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.