Human-like object concept representations emerge naturally in multimodal large language models
- URL: http://arxiv.org/abs/2407.01067v1
- Date: Mon, 1 Jul 2024 08:17:19 GMT
- Title: Human-like object concept representations emerge naturally in multimodal large language models
- Authors: Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, Jinpeng Li, Shuang Qiu, Le Chang, Huiguang He,
- Abstract summary: We combined behavioral and neuroimaging analysis methods to uncover how the object concept representations in Large Language Models correlate with those of humans.
The resulting 66-dimensional embeddings were found to be highly stable and predictive, and exhibited semantic clustering akin to human mental representations.
This study advances our understanding of machine intelligence and informs the development of more human-like artificial cognitive systems.
- Score: 24.003766123531545
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition. Recently, the rapid development of Large Language Models (LLMs) has raised the attractive question of whether these models can also develop human-like object representations through exposure to vast amounts of linguistic and multimodal data. In this study, we combined behavioral and neuroimaging analysis methods to uncover how the object concept representations in LLMs correlate with those of humans. By collecting large-scale datasets of 4.7 million triplet judgments from LLM and Multimodal LLM (MLLM), we were able to derive low-dimensional embeddings that capture the underlying similarity structure of 1,854 natural objects. The resulting 66-dimensional embeddings were found to be highly stable and predictive, and exhibited semantic clustering akin to human mental representations. Interestingly, the interpretability of the dimensions underlying these embeddings suggests that LLM and MLLM have developed human-like conceptual representations of natural objects. Further analysis demonstrated strong alignment between the identified model embeddings and neural activity patterns in many functionally defined brain ROIs (e.g., EBA, PPA, RSC and FFA). This provides compelling evidence that the object representations in LLMs, while not identical to those in the human, share fundamental commonalities that reflect key schemas of human conceptual knowledge. This study advances our understanding of machine intelligence and informs the development of more human-like artificial cognitive systems.
Related papers
- Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning [0.0]
Large Language Models (LLMs) have demonstrated their capabilities across various tasks.
This paper exploits the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks.
We compare the performance of LLMs with a cognitive instance-based learning model, which imitates human experiential decision-making.
arXiv Detail & Related papers (2024-07-12T14:13:06Z) - Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models [71.93366651585275]
We propose visualization-of-Thought (VoT) prompting for large language models (LLMs)
VoT elicits spatial reasoning of LLMs by visualizing their reasoning traces, thereby guiding subsequent reasoning steps.
We employ VoT for multi-hop spatial reasoning tasks, including natural language navigation, visual navigation, and visual tiling in 2D grid worlds.
arXiv Detail & Related papers (2024-04-04T17:45:08Z) - What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models [50.97705264224828]
We propose Counterfactual Inception, a novel method that implants counterfactual thinking into Large Multi-modal Models.
We aim for the models to engage with and generate responses that span a wider contextual scene understanding.
Comprehensive analyses across various LMMs, including both open-source and proprietary models, corroborate that counterfactual thinking significantly reduces hallucination.
arXiv Detail & Related papers (2024-03-20T11:27:20Z) - Comparing Rationality Between Large Language Models and Humans: Insights and Open Questions [6.201550639431176]
This paper focuses on the burgeoning prominence of large language models (LLMs)
We underscore the pivotal role of Reinforcement Learning from Human Feedback (RLHF) in augmenting LLMs' rationality and decision-making prowess.
arXiv Detail & Related papers (2024-03-14T18:36:04Z) - Human Simulacra: Benchmarking the Personification of Large Language Models [38.21708264569801]
Large language models (LLMs) are recognized as systems that closely mimic aspects of human intelligence.
This paper introduces a framework for constructing virtual characters' life stories from the ground up.
Experimental results demonstrate that our constructed simulacra can produce personified responses that align with their target characters.
arXiv Detail & Related papers (2024-02-28T09:11:14Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Detecting Any Human-Object Interaction Relationship: Universal HOI
Detector with Spatial Prompt Learning on Foundation Models [55.20626448358655]
This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs)
Our design includes an HO Prompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image.
For open-category interaction recognition, our method supports either of two input types: interaction phrase or interpretive sentence.
arXiv Detail & Related papers (2023-11-07T08:27:32Z) - Unveiling Theory of Mind in Large Language Models: A Parallel to Single
Neurons in the Human Brain [2.5350521110810056]
Large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM)
The precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown.
arXiv Detail & Related papers (2023-09-04T15:26:15Z) - Conceptual structure coheres in human cognition but not in large
language models [7.405352374343134]
We show that conceptual structure is robust to differences in culture, language, and method of estimation.
Results highlight an important difference between contemporary large language models and human cognition.
arXiv Detail & Related papers (2023-04-05T21:27:01Z) - Multimodal foundation models are better simulators of the human brain [65.10501322822881]
We present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs.
We find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
arXiv Detail & Related papers (2022-08-17T12:36:26Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.