Exploring the LLM Journey from Cognition to Expression with Linear Representations
- URL: http://arxiv.org/abs/2405.16964v2
- Date: Fri, 08 Nov 2024 05:19:48 GMT
- Title: Exploring the LLM Journey from Cognition to Expression with Linear Representations
- Authors: Yuzi Yan, Jialian Li, Yipin Zhang, Dong Yan,
- Abstract summary: This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs)
We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF)
Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF.
- Score: 10.92882688742428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF. Statistical analyses confirm a significant correlation between the two capabilities, suggesting that cognitive capacity may limit expressive potential. The paper also explores the theoretical underpinnings of these divergent developmental trajectories and their connection to the LLMs' architectural design. Moreover, we evaluate various optimization-independent strategies, such as few-shot learning and repeated sampling, which bridge the gap between cognitive and expressive capabilities. This research reveals the potential connection between the hidden space and the output space, contributing valuable insights into the interpretability and controllability of their training processes.
Related papers
- Lilith: Developmental Modular LLMs with Chemical Signaling [49.1574468325115]
Current paradigms in Artificial Intelligence rely on layers of feedforward networks which model brain activity at the neuronal level.<n>We propose LILITH, a novel architecture that combines developmental training of modular language models with brain-inspired token-based communication protocols.
arXiv Detail & Related papers (2025-07-06T23:18:51Z) - Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study [50.065744358362345]
Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning.<n>Yet their learning ability, which is crucial for adapting to dynamic environments and acquiring new knowledge, remains underexplored.
arXiv Detail & Related papers (2025-06-16T13:24:50Z) - Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing [62.447497430479174]
Drawing to reason in space is a novel paradigm that enables LVLMs to reason through elementary drawing operations in the visual space.<n>Our model, named VILASR, consistently outperforms existing methods across diverse spatial reasoning benchmarks.
arXiv Detail & Related papers (2025-06-11T17:41:50Z) - Quantifying Cross-Modality Memorization in Vision-Language Models [86.82366725590508]
We study the unique characteristics of cross-modality memorization and conduct a systematic study centered on vision-language models.<n>Our results reveal that facts learned in one modality transfer to the other, but a significant gap exists between recalling information in the source and target modalities.
arXiv Detail & Related papers (2025-06-05T16:10:47Z) - Dynamic Programming Techniques for Enhancing Cognitive Representation in Knowledge Tracing [125.75923987618977]
We propose the Cognitive Representation Dynamic Programming based Knowledge Tracing (CRDP-KT) model.<n>It is a dynamic programming algorithm to optimize cognitive representations based on the difficulty of the questions and the performance intervals between them.<n>It provides more accurate and systematic input features for subsequent model training, thereby minimizing distortion in the simulation of cognitive states.
arXiv Detail & Related papers (2025-06-03T14:44:48Z) - Visual Large Language Models Exhibit Human-Level Cognitive Flexibility in the Wisconsin Card Sorting Test [5.346677002840565]
This study assesses the cognitive flexibility of state-of-the-art Visual Large Language Models (VLLMs)<n>Our results reveal that VLLMs achieve or surpass human-level set-shifting capabilities under chain-of-thought prompting with text-based inputs.
arXiv Detail & Related papers (2025-05-28T08:40:55Z) - Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations [1.0485739694839669]
Large language models (LLMs) can sometimes report the strategies they actually use to solve tasks, but they can also fail to do so.<n>This suggests some degree of metacognition -- the capacity to monitor one's own cognitive processes for subsequent reporting and self-control.<n>We introduce a neuroscience-inspired neurofeedback paradigm designed to quantify the ability of LLMs to explicitly report and control their activation patterns.
arXiv Detail & Related papers (2025-05-19T22:32:25Z) - Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision [22.553688605475333]
We show that brain-in-the-loop supervised learning can effectively transfer human conceptual structures to deep neural networks (DNNs)<n> Experimental results indicate that the enhanced cognitive capabilities lead to substantial performance gains in challenging tasks.<n>These findings highlight that human-in-the-loop supervision can effectively augment the complex cognitive abilities of large models.
arXiv Detail & Related papers (2025-05-14T02:39:10Z) - Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning [58.86928947970342]
Embodied-R is a framework combining large-scale Vision-Language Models for perception and small-scale Language Models for reasoning.
After training on only 5k embodied video samples, Embodied-R with a 3B LM matches state-of-the-art multimodal reasoning models.
Embodied-R also exhibits emergent thinking patterns such as systematic analysis and contextual integration.
arXiv Detail & Related papers (2025-04-17T06:16:11Z) - VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models [62.667142971664575]
We introduce VisFactor, a novel benchmark derived from the Factor-Referenced Cognitive Test (FRCT)
VisFactor digitalizes vision-related FRCT subtests to systematically evaluate MLLMs across essential visual cognitive tasks.
We present a comprehensive evaluation of state-of-the-art MLLMs, such as GPT-4o, Gemini-Pro, and Qwen-VL.
arXiv Detail & Related papers (2025-02-23T04:21:32Z) - Neuron-based Personality Trait Induction in Large Language Models [115.08894603023712]
Large language models (LLMs) have become increasingly proficient at simulating various personality traits.
We present a neuron-based approach for personality trait induction in LLMs.
arXiv Detail & Related papers (2024-10-16T07:47:45Z) - CogniDual Framework: Self-Training Large Language Models within a Dual-System Theoretical Framework for Improving Cognitive Tasks [39.43278448546028]
Kahneman's dual-system theory elucidates the human decision-making process, distinguishing between the rapid, intuitive System 1 and the deliberative, rational System 2.
Recent advancements have positioned large language Models (LLMs) as formidable tools nearing human-level proficiency in various cognitive tasks.
This study introduces the textbfCogniDual Framework for LLMs (CFLLMs), designed to assess whether LLMs can, through self-training, evolve from deliberate deduction to intuitive responses.
arXiv Detail & Related papers (2024-09-05T09:33:24Z) - Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making.
Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations.
Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z) - Large Language Models are Limited in Out-of-Context Knowledge Reasoning [65.72847298578071]
Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning.
This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge.
arXiv Detail & Related papers (2024-06-11T15:58:59Z) - Verbalized Probabilistic Graphical Modeling with Large Language Models [8.961720262676195]
This work introduces a novel Bayesian prompting approach that facilitates training-free Bayesian inference with large language models.
Our results indicate that the model effectively enhances confidence elicitation and text generation quality, demonstrating its potential to improve AI language understanding systems.
arXiv Detail & Related papers (2024-06-08T16:35:31Z) - Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
We investigate whether attention heads encode two types of relationships between tokens present in natural languages.
We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens.
arXiv Detail & Related papers (2024-02-20T14:43:39Z) - The dynamic interplay between in-context and in-weight learning in humans and neural networks [15.744573869783972]
We show that "in-context learning" (ICL) can equip neural networks with fundamentally different learning properties that can coexist with their native IWL.
Our work shows how emergent ICL can equip neural networks with fundamentally different learning properties that can coexist with their native IWL.
arXiv Detail & Related papers (2024-02-13T18:55:27Z) - CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models [24.079412787914993]
We propose the concept of the cognitive dynamics of large language models (LLMs) and present a corresponding task with the inspiration of longitudinal studies.
Towards the task, we develop CogBench, a novel benchmark to assess the cognitive dynamics of LLMs and validate it through participant surveys.
We introduce CogGPT for the task, which features an innovative iterative cognitive mechanism aimed at enhancing lifelong cognitive dynamics.
arXiv Detail & Related papers (2024-01-06T03:59:59Z) - A Novel Neural-symbolic System under Statistical Relational Learning [50.747658038910565]
We propose a general bi-level probabilistic graphical reasoning framework called GBPGR.
In GBPGR, the results of symbolic reasoning are utilized to refine and correct the predictions made by the deep learning models.
Our approach achieves high performance and exhibits effective generalization in both transductive and inductive tasks.
arXiv Detail & Related papers (2023-09-16T09:15:37Z) - On the Unexpected Abilities of Large Language Models [0.0]
Large Language Models (LLMs) are capable of displaying a wide range of abilities that are not directly connected with the task for which they are trained.
I discuss the nature of the indirect process that leads to the acquisition of these cognitive abilities, their relation to other indirect processes, and the implications for the acquisition of integrated abilities.
arXiv Detail & Related papers (2023-08-09T09:15:07Z) - Can Offline Reinforcement Learning Help Natural Language Understanding? [31.788133426611587]
We consider investigating the potential connection between offline reinforcement learning (RL) and language modeling (LM)
RL and LM are similar in predicting the next states based on the current and previous states, which rely on both local and long-range dependency across states.
Experimental results show that our RL pre-trained models can give close performance compared with the models using the LM training objective.
arXiv Detail & Related papers (2022-09-15T02:55:10Z) - CogAlign: Learning to Align Textual Neural Representations to Cognitive
Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models.
We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.