Contextual Feature Extraction Hierarchies Converge in Large Language
Models and the Brain
- URL: http://arxiv.org/abs/2401.17671v1
- Date: Wed, 31 Jan 2024 08:48:35 GMT
- Title: Contextual Feature Extraction Hierarchies Converge in Large Language
Models and the Brain
- Authors: Gavin Mischler, Yinghao Aaron Li, Stephan Bickel, Ashesh D. Mehta and
Nima Mesgarani
- Abstract summary: We show that as large language models (LLMs) achieve higher performance on benchmark tasks, they become more brain-like.
We also show the importance of contextual information in improving model performance and brain similarity.
- Score: 12.92793034617015
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advancements in artificial intelligence have sparked interest in the
parallels between large language models (LLMs) and human neural processing,
particularly in language comprehension. While prior research has established
similarities in the representation of LLMs and the brain, the underlying
computational principles that cause this convergence, especially in the context
of evolving LLMs, remain elusive. Here, we examined a diverse selection of
high-performance LLMs with similar parameter sizes to investigate the factors
contributing to their alignment with the brain's language processing
mechanisms. We find that as LLMs achieve higher performance on benchmark tasks,
they not only become more brain-like as measured by higher performance when
predicting neural responses from LLM embeddings, but also their hierarchical
feature extraction pathways map more closely onto the brain's while using fewer
layers to do the same encoding. We also compare the feature extraction pathways
of the LLMs to each other and identify new ways in which high-performing models
have converged toward similar hierarchical processing mechanisms. Finally, we
show the importance of contextual information in improving model performance
and brain similarity. Our findings reveal the converging aspects of language
processing in the brain and LLMs and offer new directions for developing models
that align more closely with human cognitive processing.
Related papers
- Lost in Translation: The Algorithmic Gap Between LMs and the Brain [8.799971499357499]
Language Models (LMs) have achieved impressive performance on various linguistic tasks, but their relationship to human language processing in the brain remains unclear.
This paper examines the gaps and overlaps between LMs and the brain at different levels of analysis.
We discuss how insights from neuroscience, such as sparsity, modularity, internal states, and interactive learning, can inform the development of more biologically plausible language models.
arXiv Detail & Related papers (2024-07-05T17:43:16Z) - Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network [16.317199232071232]
Large Language Models (LLMs) have been shown to be effective models of the human language system.
In this work, we investigate the key architectural components driving the surprising alignment of untrained models.
arXiv Detail & Related papers (2024-06-21T12:54:03Z) - What Are Large Language Models Mapping to in the Brain? A Case Against Over-Reliance on Brain Scores [1.8175282137722093]
Internal representations from large language models (LLMs) achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing.
Here, we analyze three neural datasets used in an impactful study on LLM-to-brain mappings, with a particular focus on an fMRI dataset where participants read short passages.
We find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings.
arXiv Detail & Related papers (2024-06-03T17:13:27Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Do Large Language Models Mirror Cognitive Language Processing? [43.68923267228057]
Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning.
In cognitive science, brain cognitive processing signals are typically utilized to study human language processing.
We employ Representational Similarity Analysis (RSA) to measure the alignment between 23 mainstream LLMs and fMRI signals of the brain.
arXiv Detail & Related papers (2024-02-28T03:38:20Z) - Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.
We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs.
Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z) - Identifying Semantic Induction Heads to Understand In-Context Learning [103.00463655766066]
We investigate whether attention heads encode two types of relationships between tokens present in natural languages.
We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens.
arXiv Detail & Related papers (2024-02-20T14:43:39Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Instruction-tuning Aligns LLMs to the Human Brain [20.86703074354748]
Instruction-tuning enables large language models to generate output that more closely resembles human responses to natural language queries.
We investigate whether instruction-tuning makes large language models more similar to how humans process language.
We find that instruction-tuning generally enhances brain alignment by an average of 6%, but does not have a similar effect on behavioral alignment.
arXiv Detail & Related papers (2023-12-01T13:31:02Z) - Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View [60.80731090755224]
This paper probes the collaboration mechanisms among contemporary NLP systems by practical experiments with theoretical insights.
We fabricate four unique societies' comprised of LLM agents, where each agent is characterized by a specific trait' (easy-going or overconfident) and engages in collaboration with a distinct thinking pattern' (debate or reflection)
Our results further illustrate that LLM agents manifest human-like social behaviors, such as conformity and consensus reaching, mirroring social psychology theories.
arXiv Detail & Related papers (2023-10-03T15:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.