Related papers: Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

URL: http://arxiv.org/abs/2401.17671v1
Date: Wed, 31 Jan 2024 08:48:35 GMT
Title: Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain
Authors: Gavin Mischler, Yinghao Aaron Li, Stephan Bickel, Ashesh D. Mehta and Nima Mesgarani
Abstract summary: We show that as large language models (LLMs) achieve higher performance on benchmark tasks, they become more brain-like. We also show the importance of contextual information in improving model performance and brain similarity.
Score: 12.92793034617015
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advancements in artificial intelligence have sparked interest in the parallels between large language models (LLMs) and human neural processing, particularly in language comprehension. While prior research has established similarities in the representation of LLMs and the brain, the underlying computational principles that cause this convergence, especially in the context of evolving LLMs, remain elusive. Here, we examined a diverse selection of high-performance LLMs with similar parameter sizes to investigate the factors contributing to their alignment with the brain's language processing mechanisms. We find that as LLMs achieve higher performance on benchmark tasks, they not only become more brain-like as measured by higher performance when predicting neural responses from LLM embeddings, but also their hierarchical feature extraction pathways map more closely onto the brain's while using fewer layers to do the same encoding. We also compare the feature extraction pathways of the LLMs to each other and identify new ways in which high-performing models have converged toward similar hierarchical processing mechanisms. Finally, we show the importance of contextual information in improving model performance and brain similarity. Our findings reveal the converging aspects of language processing in the brain and LLMs and offer new directions for developing models that align more closely with human cognitive processing.

Related papers

The Emergence of Abstract Thought in Large Language Models Beyond Any Language [95.50197866832772]
Large language models (LLMs) function effectively across a diverse range of languages.<n>Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts.<n>Recent results show strong multilingual performance, even surpassing English performance on specific tasks in other languages.
arXiv Detail & Related papers (2025-06-11T16:00:54Z)
Do Large Language Models Think Like the Brain? Sentence-Level Evidence from fMRI and Hierarchical Embeddings [28.210559128941593]
This study investigates how hierarchical representations in large language models align with the dynamic neural responses during human sentence comprehension.<n>Results show that improvements in model performance drive the evolution of representational architectures toward brain-like hierarchies.
arXiv Detail & Related papers (2025-05-28T16:40:06Z)
A Survey of Scaling in Large Language Model Reasoning [62.92861523305361]
We provide a comprehensive examination of scaling in large Language models (LLMs) reasoning. We analyze scaling in reasoning steps that improves multi-step inference and logical consistency. We discuss scaling in training-enabled reasoning, focusing on optimization through iterative model improvement.
arXiv Detail & Related papers (2025-04-02T23:51:27Z)
Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models [53.91412558475662]
We use methods similar to those in the field of functional neuroimaging analysis to locate and identify functional networks in large language models (LLMs) Experimental results show that, similar to the human brain, LLMs contain functional networks that frequently recur during operation. Masking key functional networks significantly impairs the model's performance, while retaining just a subset is adequate to maintain effective operation.
arXiv Detail & Related papers (2025-02-13T04:42:39Z)
Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI) Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli. In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs) This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
Lost in Translation: The Algorithmic Gap Between LMs and the Brain [8.799971499357499]
Language Models (LMs) have achieved impressive performance on various linguistic tasks, but their relationship to human language processing in the brain remains unclear. This paper examines the gaps and overlaps between LMs and the brain at different levels of analysis. We discuss how insights from neuroscience, such as sparsity, modularity, internal states, and interactive learning, can inform the development of more biologically plausible language models.
arXiv Detail & Related papers (2024-07-05T17:43:16Z)
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network [16.317199232071232]
Large Language Models (LLMs) have been shown to be effective models of the human language system. In this work, we investigate the key architectural components driving the surprising alignment of untrained models.
arXiv Detail & Related papers (2024-06-21T12:54:03Z)
What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores [1.8175282137722093]
Internal representations from large language models (LLMs) achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. Here, we analyze three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages. We find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings.
arXiv Detail & Related papers (2024-06-03T17:13:27Z)
Do Large Language Models Mirror Cognitive Language Processing? [43.68923267228057]
Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning. In cognitive science, brain cognitive processing signals are typically utilized to study human language processing. We employ Representational Similarity Analysis (RSA) to measure the alignment between 23 mainstream LLMs and fMRI signals of the brain.
arXiv Detail & Related papers (2024-02-28T03:38:20Z)
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z)
Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks. The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z)
Probing Large Language Models from A Human Behavioral Perspective [24.109080140701188]
Large Language Models (LLMs) have emerged as dominant foundational models in modern NLP. The understanding of their prediction processes and internal mechanisms, such as feed-forward networks (FFN) and multi-head self-attention (MHSA) remains largely unexplored.
arXiv Detail & Related papers (2023-10-08T16:16:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.