Core Knowledge Deficits in Multi-Modal Language Models
- URL: http://arxiv.org/abs/2410.10855v3
- Date: Sun, 09 Mar 2025 04:39:42 GMT
- Title: Core Knowledge Deficits in Multi-Modal Language Models
- Authors: Yijiang Li, Qingying Gao, Tianwei Zhao, Bingyang Wang, Haoran Sun, Haiyun Lyu, Dezhi Luo, Hokin Deng,
- Abstract summary: We examine the hypothesis that deficiencies stem from the absence of core knowledge innate to humans from early childhood.<n>Our findings reveal core knowledge deficits in early developed core abilities while models demonstrate human comparable performance in high level cognition.<n>We introduce an evaluation technique, Concept Hacking, through which we demonstrate that MLLMs do not genuinely advance toward core knowledge.
- Score: 8.461561516444261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While Multimodal Large Language Models (MLLMs) demonstrate impressive abilities over high level perception and reasoning, their robustness in the wild still lags behind humans and exhibits diminished efficacy on simple tasks that are intuitive for humans. We examine the hypothesis that these deficiencies stem from the absence of core knowledge, rudimentary cognitive abilities innate to humans from early childhood. To probe core knowledge representation in MLLMs, we draw from developmental cognitive sciences and develop a large-scale benchmark, CoreCognition dataset, encompassing 12 core cognitive concepts. We evaluate 219 models with 10 different prompts, leading to a total of 2409 data points for analysis. Our findings reveal core knowledge deficits in early developed core abilities while models demonstrate human comparable performance in high level cognition. Moreover, we find that low level abilities show little to no scaling, in stark contrast to high level abilities. Finally, we introduce an evaluation technique, Concept Hacking, through which we demonstrate that MLLMs do not genuinely advance toward core knowledge but instead rely on illusory understanding and shortcut learning as they scale. Website with this $\href{https://growing-ai-like-a-child.github.io/}{link}$.
Related papers
- Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study [50.065744358362345]
Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning.<n>Yet their learning ability, which is crucial for adapting to dynamic environments and acquiring new knowledge, remains underexplored.
arXiv Detail & Related papers (2025-06-16T13:24:50Z) - Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation [75.26829371493189]
Large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking.<n>Existing reasoning benchmarks either focus on domain-specific knowledge (crystallized intelligence) or lack interpretability.<n>We propose DRE-Bench, a dynamic reasoning evaluation benchmark grounded in a hierarchical cognitive framework.
arXiv Detail & Related papers (2025-06-03T09:01:08Z) - Metacognition and Uncertainty Communication in Humans and Large Language Models [3.0493183668102293]
Large language models (LLMs) are increasingly embedded in high-stakes decision contexts.
It is critical to assess whether, how, and to what extent they exhibit metacognitive abilities.
We show that while humans and LLMs can sometimes appear quite aligned in their metacognitive capacities and behaviors, it is clear many differences remain.
arXiv Detail & Related papers (2025-04-18T19:24:17Z) - MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models [50.43793764203352]
We introduce MDK12-Bench, a multi-disciplinary benchmark assessing the reasoning capabilities of MLLMs via real-world K-12 examinations.<n>Our benchmark comprises 140K reasoning instances across diverse difficulty levels from primary school to 12th grade.<n>It features 6,827 instance-level knowledge point annotations based on a well-organized knowledge structure, detailed answer explanations, difficulty labels and cross-year partitions.
arXiv Detail & Related papers (2025-04-08T08:06:53Z) - Towards Understanding How Knowledge Evolves in Large Vision-Language Models [55.82918299608732]
We investigate how multimodal knowledge evolves and eventually induces natural languages in Large Vision-Language Models (LVLMs)<n>We identify two key nodes in knowledge evolution: the critical layers and the mutation layers, dividing the evolution process into three stages: rapid evolution, stabilization, and mutation.<n>Our research is the first to reveal the trajectory of knowledge evolution in LVLMs, providing a fresh perspective for understanding their underlying mechanisms.
arXiv Detail & Related papers (2025-03-31T17:35:37Z) - How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition [75.11808682808065]
This study investigates whether large language models (LLMs) exhibit similar tendencies in understanding semantic size.
Our findings reveal that multi-modal training is crucial for LLMs to achieve more human-like understanding.
Lastly, we examine whether LLMs are influenced by attention-grabbing headlines with larger semantic sizes in a real-world web shopping scenario.
arXiv Detail & Related papers (2025-03-01T03:35:56Z) - The Philosophical Foundations of Growing AI Like A Child [0.0]
This paper argues that both challenges stem from a core discrepancy between human and machine cognitive development.
It explores empirical evidence of core knowledge in humans, analyzes why language models fail to acquire it, and argues that this limitation is not an inherent architectural constraint.
arXiv Detail & Related papers (2025-02-15T09:47:20Z) - Refine Knowledge of Large Language Models via Adaptive Contrastive Learning [54.61213933999464]
A mainstream category of methods is to reduce hallucinations by optimizing the knowledge representation of Large Language Models.
We believe that the process of models refining knowledge can greatly benefit from the way humans learn.
In our work, by imitating the human learning process, we design an Adaptive Contrastive Learning strategy.
arXiv Detail & Related papers (2025-02-11T02:19:13Z) - Can Language Models Learn to Skip Steps? [59.84848399905409]
We study the ability to skip steps in reasoning.
Unlike humans, who may skip steps to enhance efficiency or to reduce cognitive load, models do not possess such motivations.
Our work presents the first exploration into human-like step-skipping ability.
arXiv Detail & Related papers (2024-11-04T07:10:24Z) - Vision Language Models See What You Want but not What You See [9.268588981925234]
Knowing others' intentions and taking others' perspectives are two core components of human intelligence.
Infiltrating machines with these abilities is an important step towards building human-level artificial intelligence.
We investigated intentionality understanding and level-2 perspective-taking in Vision Language Models.
arXiv Detail & Related papers (2024-10-01T01:52:01Z) - CogniDual Framework: Self-Training Large Language Models within a Dual-System Theoretical Framework for Improving Cognitive Tasks [39.43278448546028]
Kahneman's dual-system theory elucidates the human decision-making process, distinguishing between the rapid, intuitive System 1 and the deliberative, rational System 2.
Recent advancements have positioned large language Models (LLMs) as formidable tools nearing human-level proficiency in various cognitive tasks.
This study introduces the textbfCogniDual Framework for LLMs (CFLLMs), designed to assess whether LLMs can, through self-training, evolve from deliberate deduction to intuitive responses.
arXiv Detail & Related papers (2024-09-05T09:33:24Z) - CogLM: Tracking Cognitive Development of Large Language Models [20.138831477848615]
We construct a benchmark CogLM based on Piaget's Theory of Cognitive Development.
CogLM comprises 1,220 questions spanning 10 cognitive abilities crafted by more than 20 human experts.
We find that human-like cognitive abilities have emerged in advanced LLMs (GPT-4), comparable to those of a 20-year-old human.
arXiv Detail & Related papers (2024-08-17T09:49:40Z) - Development of Cognitive Intelligence in Pre-trained Language Models [3.1815791977708834]
Recent studies show evidence for emergent cognitive abilities in Large Pre-trained Language Models.
The developmental trajectories of PLMs consistently exhibit a window of maximal alignment to human cognitive development.
After that window, training appears to serve the engineering goal of reducing loss but not the scientific goal of increasing alignment with human cognition.
arXiv Detail & Related papers (2024-07-01T07:56:36Z) - M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark [25.44666570272266]
We introduce the first cognitive-driven multi-lingual and multi-modal benchmark to evaluate the general intelligence ability of MLLMs.
We identify five key cognitive factors based on the well-recognized Cattell-Horn-Carrol (CHC) model of intelligence.
We go beyond English to encompass other languages based on their popularity, including Chinese, French, Spanish, Portuguese and Korean.
arXiv Detail & Related papers (2024-06-08T04:07:09Z) - Can large language models understand uncommon meanings of common words? [30.527834781076546]
Large language models (LLMs) have shown significant advancements across diverse natural language understanding (NLU) tasks.
Yet, lacking widely acknowledged testing mechanisms, answering whether LLMs are parrots or genuinely comprehend the world' remains unclear.
This paper presents innovative construction of a Lexical Semantic dataset with novel evaluation metrics.
arXiv Detail & Related papers (2024-05-09T12:58:22Z) - What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models [50.97705264224828]
We propose Counterfactual Inception, a novel method that implants counterfactual thinking into Large Multi-modal Models.
We aim for the models to engage with and generate responses that span a wider contextual scene understanding.
Comprehensive analyses across various LMMs, including both open-source and proprietary models, corroborate that counterfactual thinking significantly reduces hallucination.
arXiv Detail & Related papers (2024-03-20T11:27:20Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - ToMBench: Benchmarking Theory of Mind in Large Language Models [41.565202027904476]
ToM is the cognitive capability to perceive and ascribe mental states to oneself and others.<n>Existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination.<n>We introduce ToMBench with three key characteristics: a systematic evaluation framework encompassing 8 tasks and 31 abilities in social cognition, a multiple-choice question format to support automated and unbiased evaluation, and a build-from-scratch bilingual inventory to strictly avoid data leakage.
arXiv Detail & Related papers (2024-02-23T02:05:46Z) - Think Twice: Perspective-Taking Improves Large Language Models'
Theory-of-Mind Capabilities [63.90227161974381]
SimToM is a novel prompting framework inspired by Simulation Theory's notion of perspective-taking.
Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods.
arXiv Detail & Related papers (2023-11-16T22:49:27Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Do Large Language Models Know What They Don't Know? [74.65014158544011]
Large language models (LLMs) have a wealth of knowledge that allows them to excel in various Natural Language Processing (NLP) tasks.
Despite their vast knowledge, LLMs are still limited by the amount of information they can accommodate and comprehend.
This study aims to evaluate LLMs' self-knowledge by assessing their ability to identify unanswerable or unknowable questions.
arXiv Detail & Related papers (2023-05-29T15:30:13Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Structured, flexible, and robust: benchmarking and improving large
language models towards more human-like behavior in out-of-distribution
reasoning tasks [39.39138995087475]
We ask how much of human-like thinking can be captured by learning statistical patterns in language alone.
Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization.
We find that humans are far more robust than LLMs on this benchmark.
arXiv Detail & Related papers (2022-05-11T18:14:33Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and
Reasoning [78.13740873213223]
Bongard problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems.
We propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning.
arXiv Detail & Related papers (2020-10-02T03:19:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.