Brain in a Vat: On Missing Pieces Towards Artificial General
  Intelligence in Large Language Models
        - URL: http://arxiv.org/abs/2307.03762v1
- Date: Fri, 7 Jul 2023 13:58:16 GMT
- Title: Brain in a Vat: On Missing Pieces Towards Artificial General
  Intelligence in Large Language Models
- Authors: Yuxi Ma, Chi Zhang, Song-Chun Zhu
- Abstract summary: We propose four characteristics of generally intelligent agents.
We argue that active engagement with objects in the real world delivers more robust signals for forming conceptual representations.
We conclude by outlining promising future research directions in the field of artificial general intelligence.
- Score: 83.63242931107638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   In this perspective paper, we first comprehensively review existing
evaluations of Large Language Models (LLMs) using both standardized tests and
ability-oriented benchmarks. We pinpoint several problems with current
evaluation methods that tend to overstate the capabilities of LLMs. We then
articulate what artificial general intelligence should encompass beyond the
capabilities of LLMs. We propose four characteristics of generally intelligent
agents: 1) they can perform unlimited tasks; 2) they can generate new tasks
within a context; 3) they operate based on a value system that underpins task
generation; and 4) they have a world model reflecting reality, which shapes
their interaction with the world. Building on this viewpoint, we highlight the
missing pieces in artificial general intelligence, that is, the unity of
knowing and acting. We argue that active engagement with objects in the real
world delivers more robust signals for forming conceptual representations.
Additionally, knowledge acquisition isn't solely reliant on passive input but
requires repeated trials and errors. We conclude by outlining promising future
research directions in the field of artificial general intelligence.
 
      
        Related papers
        - Teaching Language Models To Gather Information Proactively [53.85419549904644]
 Large language models (LLMs) are increasingly expected to function as collaborative partners.<n>In this work, we introduce a new task paradigm: proactive information gathering.<n>We design a scalable framework that generates partially specified, real-world tasks, masking key information.<n>Within this setup, our core innovation is a reinforcement finetuning strategy that rewards questions that elicit genuinely new, implicit user information.
 arXiv  Detail & Related papers  (2025-07-28T23:50:09Z)
- Pixels, Patterns, but No Poetry: To See The World like Humans [33.773551676022514]
 State-of-the-art MLLMs exhibit catastrophic failures on our perceptual tasks trivial for humans.<n>This paper shifts focus from reasoning to perception.
 arXiv  Detail & Related papers  (2025-07-21T21:50:16Z)
- From Passive to Active Reasoning: Can Large Language Models Ask the   Right Questions under Incomplete Information? [34.959850282872594]
 We present AR-Bench, a novel benchmark designed explicitly to evaluate an LLM's active reasoning skills.<n>AR-Bench comprises three task families-detective cases, situation puzzles, and guessing numbers.<n> Empirical evaluation on AR-Bench demonstrates that contemporary LLMs exhibit pronounced difficulties with active reasoning.
 arXiv  Detail & Related papers  (2025-06-09T23:56:41Z)
- Truly Assessing Fluid Intelligence of Large Language Models through   Dynamic Reasoning Evaluation [75.26829371493189]
 Large language models (LLMs) have demonstrated impressive reasoning capacities that mirror human-like thinking.<n>Existing reasoning benchmarks either focus on domain-specific knowledge (crystallized intelligence) or lack interpretability.<n>We propose DRE-Bench, a dynamic reasoning evaluation benchmark grounded in a hierarchical cognitive framework.
 arXiv  Detail & Related papers  (2025-06-03T09:01:08Z)
- Visual-O1: Understanding Ambiguous Instructions via Multi-modal   Multi-turn Chain-of-thoughts Reasoning [53.45295657891099]
 This paper proposes Visual-O1, a multi-modal multi-turn chain-of-thought reasoning framework.
It simulates human multi-modal multi-turn reasoning, providing instantial experience for highly intelligent models.
Our work highlights the potential of artificial intelligence to work like humans in real-world scenarios with uncertainty and ambiguity.
 arXiv  Detail & Related papers  (2024-10-04T11:18:41Z)
- How to Measure the Intelligence of Large Language Models? [0.24578723416255752]
 We argue that the intelligence of language models should not only be assessed by task-specific statistical metrics.
We show that the choice of metrics has already been shown to dramatically influence assessments on potential intelligence emergence.
 arXiv  Detail & Related papers  (2024-07-30T13:53:48Z)
- Aligning Cyber Space with Physical World: A Comprehensive Survey on   Embodied AI [129.08019405056262]
 Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial Intelligence (AGI)
MLMs andWMs have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities.
In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI.
 arXiv  Detail & Related papers  (2024-07-09T14:14:47Z)
- WorkArena++: Towards Compositional Planning and Reasoning-based Common   Knowledge Work Tasks [85.95607119635102]
 Large language models (LLMs) can mimic human-like intelligence.
WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
 arXiv  Detail & Related papers  (2024-07-07T07:15:49Z)
- Can large language models understand uncommon meanings of common words? [30.527834781076546]
 Large language models (LLMs) have shown significant advancements across diverse natural language understanding (NLU) tasks.
Yet, lacking widely acknowledged testing mechanisms, answering whether LLMs are parrots or genuinely comprehend the world' remains unclear.
This paper presents innovative construction of a Lexical Semantic dataset with novel evaluation metrics.
 arXiv  Detail & Related papers  (2024-05-09T12:58:22Z)
- A Survey on Robotics with Foundation Models: toward Embodied AI [30.999414445286757]
 Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks.
This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control.
 arXiv  Detail & Related papers  (2024-02-04T07:55:01Z)
- MacGyver: Are Large Language Models Creative Problem Solvers? [87.70522322728581]
 We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting.
We create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems.
We present our collection to both LLMs and humans to compare and contrast their problem-solving abilities.
 arXiv  Detail & Related papers  (2023-11-16T08:52:27Z)
- A Sentence is Worth a Thousand Pictures: Can Large Language Models   Understand Hum4n L4ngu4ge and the W0rld behind W0rds? [2.7342737448775534]
 Large Language Models (LLMs) have been linked to claims about human-like linguistic performance.
We analyze the contribution of LLMs as theoretically informative representations of a target cognitive system.
We evaluate the models' ability to see the bigger picture, through top-down feedback from higher levels of processing.
 arXiv  Detail & Related papers  (2023-07-26T18:58:53Z)
- WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
 We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
 arXiv  Detail & Related papers  (2021-10-27T12:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.