Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans
- URL: http://arxiv.org/abs/2502.15090v1
- Date: Thu, 20 Feb 2025 23:08:03 GMT
- Title: Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans
- Authors: Masha Fedzechkina, Eleonora Gualdoni, Sinead Williamson, Katherine Metcalf, Skyler Seto, Barry-John Theobald,
- Abstract summary: This work introduces a novel approach to the study of representation alignment.<n>We adopt a method from research on activation steering to identify neurons responsible for specific concepts.<n>Our findings reveal that LLM representations closely align with human representations inferred from behavioral data.
- Score: 3.431979707540646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern large language models (LLMs) achieve impressive performance on some tasks, while exhibiting distinctly non-human-like behaviors on others. This raises the question of how well the LLM's learned representations align with human representations. In this work, we introduce a novel approach to the study of representation alignment: we adopt a method from research on activation steering to identify neurons responsible for specific concepts (e.g., 'cat') and then analyze the corresponding activation patterns. Our findings reveal that LLM representations closely align with human representations inferred from behavioral data. Notably, this alignment surpasses that of word embeddings, which have been center stage in prior work on human and model alignment. Additionally, our approach enables a more granular view of how LLMs represent concepts. Specifically, we show that LLMs organize concepts in a way that reflects hierarchical relationships interpretable to humans (e.g., 'animal'-'dog').
Related papers
- Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis [19.032828729570458]
We use established principles and explanations from psychology and cognitive science related to complexity in human visual perception.
Our study aims to benchmark MLLMs across various explainability principles relevant to visual perception.
arXiv Detail & Related papers (2025-04-16T22:14:27Z) - Waking Up an AI: A Quantitative Framework for Prompt-Induced Phase Transition in Large Language Models [0.0]
We propose a two-part framework to investigate what underlies intuitive human thinking.
A form of conceptual fusion-current LLMs showed no significant difference in responsiveness between semantically fused and non-fused prompts.
Our method may help illuminate key differences in how intuition and conceptual leaps emerge in artificial versus human minds.
arXiv Detail & Related papers (2025-04-16T06:49:45Z) - I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [79.01538178959726]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.
We introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z) - How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition [75.11808682808065]
This study investigates whether large language models (LLMs) exhibit similar tendencies in understanding semantic size.
Our findings reveal that multi-modal training is crucial for LLMs to achieve more human-like understanding.
Lastly, we examine whether LLMs are influenced by attention-grabbing headlines with larger semantic sizes in a real-world web shopping scenario.
arXiv Detail & Related papers (2025-03-01T03:35:56Z) - Human-like conceptual representations emerge from language prediction [72.5875173689788]
We investigated the emergence of human-like conceptual representations within large language models (LLMs)<n>We found that LLMs were able to infer concepts from definitional descriptions and construct representation spaces that converge towards a shared, context-independent structure.<n>Our work supports the view that LLMs serve as valuable tools for understanding complex human cognition and paves the way for better alignment between artificial and human intelligence.
arXiv Detail & Related papers (2025-01-21T23:54:17Z) - Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)
Traditional psycholinguistic evaluations often reflect statistical biases that may misrepresent LLMs' true linguistic capabilities.
We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.
arXiv Detail & Related papers (2024-11-12T04:16:44Z) - Mind Scramble: Unveiling Large Language Model Psychology Via Typoglycemia [27.650551131885152]
Research into large language models (LLMs) has shown promise in addressing complex tasks in the physical world.
Studies suggest that powerful LLMs, like GPT-4, are beginning to exhibit human-like cognitive abilities.
arXiv Detail & Related papers (2024-10-02T15:47:25Z) - Divergences between Language Models and Human Brains [59.100552839650774]
We systematically explore the divergences between human and machine language processing.<n>We identify two domains that LMs do not capture well: social/emotional intelligence and physical commonsense.<n>Our results show that fine-tuning LMs on these domains can improve their alignment with human brain responses.
arXiv Detail & Related papers (2023-11-15T19:02:40Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Towards Concept-Aware Large Language Models [56.48016300758356]
Concepts play a pivotal role in various human cognitive functions, including learning, reasoning and communication.
There is very little work on endowing machines with the ability to form and reason with concepts.
In this work, we analyze how well contemporary large language models (LLMs) capture human concepts and their structure.
arXiv Detail & Related papers (2023-11-03T12:19:22Z) - Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks.
Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information.
This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z) - Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in
Large Language Models [4.412336603162406]
Large Language Models (LLMs) do not differentially represent numbers, which are pervasive in text.
In this work, we investigate how well popular LLMs capture the magnitudes of numbers from a behavioral lens.
arXiv Detail & Related papers (2023-05-18T07:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.