Probing Mechanical Reasoning in Large Vision Language Models
- URL: http://arxiv.org/abs/2410.00318v1
- Date: Tue, 1 Oct 2024 01:33:10 GMT
- Title: Probing Mechanical Reasoning in Large Vision Language Models
- Authors: Haoran Sun, Qingying Gao, Haiyun Lyu, Dezhi Luo, Hokin Deng, Yijiang Li,
- Abstract summary: Mechanical reasoning allows us to design tools, build bridges and canals, and construct houses which set the foundation of human civilization.
We leverage the MechBench of CogDevelop2K to test understanding of mechanical system stability, gears and pulley systems, seesaw-like systems and leverage principle, inertia and motion.
- Score: 9.268588981925234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mechanical reasoning is a fundamental ability that sets human intelligence apart from other animal intelligence. Mechanical reasoning allows us to design tools, build bridges and canals, and construct houses which set the foundation of human civilization. Embedding machines with such ability is an important step towards building human-level artificial intelligence. Recently, Li et al. built CogDevelop2K, a data-intensive cognitive experiment benchmark for assaying the developmental trajectory of machine intelligence (Li et al., 2024). Here, to investigate mechanical reasoning in Vision Language Models, we leverage the MechBench of CogDevelop2K, which contains approximately 150 cognitive experiments, to test understanding of mechanical system stability, gears and pulley systems, seesaw-like systems and leverage principle, inertia and motion, and other fluid-related systems in Large Vision Language Models. We observe diverse yet consistent behaviors over these aspects in VLMs.
Related papers
- The Trap of Presumed Equivalence: Artificial General Intelligence Should Not Be Assessed on the Scale of Human Intelligence [0.0]
A traditional approach to assessing emerging intelligence in the theory of intelligent systems is based on the similarity, "imitation" of human-like actions and behaviors.
We argue that under some natural assumptions, developing intelligent systems will be able to form their own intents and objectives.
arXiv Detail & Related papers (2024-10-14T13:39:58Z) - Vision Language Models See What You Want but not What You See [9.268588981925234]
Knowing others' intentions and taking others' perspectives are two core components of human intelligence.
In this paper, we investigate intentionality understanding and perspective-taking in Vision Language Models.
Surprisingly, we find VLMs achieving high performance on intentionality understanding but lower performance on perspective-taking.
arXiv Detail & Related papers (2024-10-01T01:52:01Z) - Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - Adaptive User-centered Neuro-symbolic Learning for Multimodal
Interaction with Autonomous Systems [0.0]
Recent advances in machine learning have enabled autonomous systems to perceive and comprehend objects.
It is essential to consider both the explicit teaching provided by humans and the implicit teaching obtained by observing human behavior.
We argue for considering both types of inputs, as well as human-in-the-loop and incremental learning techniques.
arXiv Detail & Related papers (2023-09-11T19:35:12Z) - Non-equilibrium physics: from spin glasses to machine and neural
learning [0.0]
Disordered many-body systems exhibit a wide range of emergent phenomena across different scales.
We aim to characterize such emergent intelligence in disordered systems through statistical physics.
We uncover relationships between learning mechanisms and physical dynamics that could serve as guiding principles for designing intelligent systems.
arXiv Detail & Related papers (2023-08-03T04:56:47Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - Building Human-like Communicative Intelligence: A Grounded Perspective [1.0152838128195465]
After making astounding progress in language learning, AI systems seem to approach the ceiling that does not reflect important aspects of human communicative capacities.
This paper suggests that the dominant cognitively-inspired AI directions, based on nativist and symbolic paradigms, lack necessary substantiation and concreteness to guide progress in modern AI.
I propose a list of concrete, implementable components for building "grounded" linguistic intelligence.
arXiv Detail & Related papers (2022-01-02T01:43:24Z) - Future Trends for Human-AI Collaboration: A Comprehensive Taxonomy of
AI/AGI Using Multiple Intelligences and Learning Styles [95.58955174499371]
We describe various aspects of multiple human intelligences and learning styles, which may impact on a variety of AI problem domains.
Future AI systems will be able not only to communicate with human users and each other, but also to efficiently exchange knowledge and wisdom.
arXiv Detail & Related papers (2020-08-07T21:00:13Z) - Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z) - Learning to Complement Humans [67.38348247794949]
A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks.
We demonstrate how an end-to-end learning strategy can be harnessed to optimize the combined performance of human-machine teams.
arXiv Detail & Related papers (2020-05-01T20:00:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.