Probing Mechanical Reasoning in Large Vision Language Models
- URL: http://arxiv.org/abs/2410.00318v1
- Date: Tue, 1 Oct 2024 01:33:10 GMT
- Title: Probing Mechanical Reasoning in Large Vision Language Models
- Authors: Haoran Sun, Qingying Gao, Haiyun Lyu, Dezhi Luo, Hokin Deng, Yijiang Li,
- Abstract summary: Mechanical reasoning allows us to design tools, build bridges and canals, and construct houses which set the foundation of human civilization.
We leverage the MechBench of CogDevelop2K to test understanding of mechanical system stability, gears and pulley systems, seesaw-like systems and leverage principle, inertia and motion.
- Score: 9.268588981925234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mechanical reasoning is a fundamental ability that sets human intelligence apart from other animal intelligence. Mechanical reasoning allows us to design tools, build bridges and canals, and construct houses which set the foundation of human civilization. Embedding machines with such ability is an important step towards building human-level artificial intelligence. Recently, Li et al. built CogDevelop2K, a data-intensive cognitive experiment benchmark for assaying the developmental trajectory of machine intelligence (Li et al., 2024). Here, to investigate mechanical reasoning in Vision Language Models, we leverage the MechBench of CogDevelop2K, which contains approximately 150 cognitive experiments, to test understanding of mechanical system stability, gears and pulley systems, seesaw-like systems and leverage principle, inertia and motion, and other fluid-related systems in Large Vision Language Models. We observe diverse yet consistent behaviors over these aspects in VLMs.
Related papers
- OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation [1.4719692998274154]
This paper proposes an imitation learning algorithm tailored for humanoid robots.
By focusing on the primary task objectives, the proposed algorithm suppresses environmental disturbances.
Experimental results demonstrate that the proposed method exhibits robustness and scalability across various typical task scenarios.
arXiv Detail & Related papers (2025-03-27T08:28:22Z) - Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation [90.00687889213991]
Solving complex long-horizon robotic manipulation problems requires sophisticated high-level planning capabilities.
Vision-language models (VLMs) pretrained on Internet data could in principle offer a framework for tackling such problems.
In this paper, we introduce a novel test-time framework that enhancesVLMs' physical reasoning capabilities for multi-stage manipulation tasks.
arXiv Detail & Related papers (2025-02-23T20:42:15Z) - Towards Conscious Service Robots [21.66931637743555]
Real-world robotics face challenges like variability, high-dimensional state spaces, non-linear dependencies, and partial observability.
Unlike current machine learning models, humans adapt quickly to changes and new tasks due to a cognitive architecture that enables systematic generalization and meta-cognition.
Next generation of service robots will handle novel situations and monitor themselves to avoid risks and mitigate errors.
arXiv Detail & Related papers (2025-01-25T12:32:52Z) - The Trap of Presumed Equivalence: Artificial General Intelligence Should Not Be Assessed on the Scale of Human Intelligence [0.0]
A traditional approach to assessing emerging intelligence in the theory of intelligent systems is based on the similarity, "imitation" of human-like actions and behaviors.
We argue that under some natural assumptions, developing intelligent systems will be able to form their own intents and objectives.
arXiv Detail & Related papers (2024-10-14T13:39:58Z) - Vision Language Models See What You Want but not What You See [9.268588981925234]
Knowing others' intentions and taking others' perspectives are two core components of human intelligence.
In this paper, we investigate intentionality understanding and perspective-taking in Vision Language Models.
Surprisingly, we find VLMs achieving high performance on intentionality understanding but lower performance on perspective-taking.
arXiv Detail & Related papers (2024-10-01T01:52:01Z) - Causal Reinforcement Learning for Optimisation of Robot Dynamics in Unknown Environments [4.494898338391223]
This work introduces a novel Causal Reinforcement Learning approach to enhancing robotics operations.
Our proposed machine learning architecture enables robots to learn the causal relationships between the visual characteristics of the objects.
arXiv Detail & Related papers (2024-09-20T11:40:51Z) - Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - On the Vulnerability of LLM/VLM-Controlled Robotics [54.57914943017522]
We highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities.
Our results show that simple input perturbations reduce task execution success rates by 22.2% and 14.6% in two representative LLM/VLM-controlled robotic systems.
arXiv Detail & Related papers (2024-02-15T22:01:45Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - Adaptive User-centered Neuro-symbolic Learning for Multimodal
Interaction with Autonomous Systems [0.0]
Recent advances in machine learning have enabled autonomous systems to perceive and comprehend objects.
It is essential to consider both the explicit teaching provided by humans and the implicit teaching obtained by observing human behavior.
We argue for considering both types of inputs, as well as human-in-the-loop and incremental learning techniques.
arXiv Detail & Related papers (2023-09-11T19:35:12Z) - Non-equilibrium physics: from spin glasses to machine and neural
learning [0.0]
Disordered many-body systems exhibit a wide range of emergent phenomena across different scales.
We aim to characterize such emergent intelligence in disordered systems through statistical physics.
We uncover relationships between learning mechanisms and physical dynamics that could serve as guiding principles for designing intelligent systems.
arXiv Detail & Related papers (2023-08-03T04:56:47Z) - Incremental procedural and sensorimotor learning in cognitive humanoid
robots [52.77024349608834]
This work presents a cognitive agent that can learn procedures incrementally.
We show the cognitive functions required in each substage and how adding new functions helps address tasks previously unsolved by the agent.
Results show that this approach is capable of solving complex tasks incrementally.
arXiv Detail & Related papers (2023-04-30T22:51:31Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - Building Human-like Communicative Intelligence: A Grounded Perspective [1.0152838128195465]
After making astounding progress in language learning, AI systems seem to approach the ceiling that does not reflect important aspects of human communicative capacities.
This paper suggests that the dominant cognitively-inspired AI directions, based on nativist and symbolic paradigms, lack necessary substantiation and concreteness to guide progress in modern AI.
I propose a list of concrete, implementable components for building "grounded" linguistic intelligence.
arXiv Detail & Related papers (2022-01-02T01:43:24Z) - From Machine Learning to Robotics: Challenges and Opportunities for
Embodied Intelligence [113.06484656032978]
Article argues that embodied intelligence is a key driver for the advancement of machine learning technology.
We highlight challenges and opportunities specific to embodied intelligence.
We propose research directions which may significantly advance the state-of-the-art in robot learning.
arXiv Detail & Related papers (2021-10-28T16:04:01Z) - Fit to Measure: Reasoning about Sizes for Robust Object Recognition [0.5352699766206808]
We present an approach to integrating knowledge about object sizes in a ML based architecture.
Our experiments in a real world robotic scenario show that this combined approach ensures a significant performance increase over state of the art Machine Learning methods.
arXiv Detail & Related papers (2020-10-27T13:54:37Z) - Future Trends for Human-AI Collaboration: A Comprehensive Taxonomy of
AI/AGI Using Multiple Intelligences and Learning Styles [95.58955174499371]
We describe various aspects of multiple human intelligences and learning styles, which may impact on a variety of AI problem domains.
Future AI systems will be able not only to communicate with human users and each other, but also to efficiently exchange knowledge and wisdom.
arXiv Detail & Related papers (2020-08-07T21:00:13Z) - Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z) - Learning to Complement Humans [67.38348247794949]
A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks.
We demonstrate how an end-to-end learning strategy can be harnessed to optimize the combined performance of human-machine teams.
arXiv Detail & Related papers (2020-05-01T20:00:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.