Reasoning in machine vision: learning to think fast and slow
- URL: http://arxiv.org/abs/2506.22075v1
- Date: Fri, 27 Jun 2025 10:03:05 GMT
- Title: Reasoning in machine vision: learning to think fast and slow
- Authors: Shaheer U. Saeed, Yipei Wang, Veeru Kasivisvanathan, Brian R. Davidson, Matthew J. Clarkson, Yipeng Hu, Daniel C. Alexander,
- Abstract summary: Reasoning is a hallmark of human intelligence, enabling adaptive decision-making in complex and unfamiliar scenarios.<n>Machine intelligence remains bound to training data, lacking the ability to dynamically refine solutions at inference time.<n>Here we present a novel learning paradigm that enables machine reasoning in vision by allowing performance improvement with increasing thinking time.
- Score: 10.430190333487957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reasoning is a hallmark of human intelligence, enabling adaptive decision-making in complex and unfamiliar scenarios. In contrast, machine intelligence remains bound to training data, lacking the ability to dynamically refine solutions at inference time. While some recent advances have explored reasoning in machines, these efforts are largely limited to verbal domains such as mathematical problem-solving, where explicit rules govern step-by-step reasoning. Other critical real-world tasks - including visual perception, spatial reasoning, and radiological diagnosis - require non-verbal reasoning, which remains an open challenge. Here we present a novel learning paradigm that enables machine reasoning in vision by allowing performance improvement with increasing thinking time (inference-time compute), even under conditions where labelled data is very limited. Inspired by dual-process theories of human cognition in psychology, our approach integrates a fast-thinking System I module for familiar tasks, with a slow-thinking System II module that iteratively refines solutions using self-play reinforcement learning. This paradigm mimics human reasoning by proposing, competing over, and refining solutions in data-scarce scenarios. We demonstrate superior performance through extended thinking time, compared not only to large-scale supervised learning but also foundation models and even human experts, in real-world vision tasks. These tasks include computer-vision benchmarks and cancer localisation on medical images across five organs, showcasing transformative potential for non-verbal machine reasoning.
Related papers
- DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning [11.242852367476015]
DeepEyes is a model with "thinking with images" capabilities incentivized through end-to-end reinforcement learning.<n>We propose a tool-use-oriented data selection mechanism and a reward strategy to encourage successful tool-assisted reasoning trajectories.<n>DeepEyes achieves significant performance gains on fine-grained perception and reasoning benchmarks.
arXiv Detail & Related papers (2025-05-20T13:48:11Z) - VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search [89.43196232124883]
VisuoThink is a novel framework that seamlessly integrates visuospatial and linguistic domains.<n>It enables progressive visual-textual reasoning and incorporates test-time scaling through look-ahead tree search.
arXiv Detail & Related papers (2025-04-12T08:37:30Z) - Dual Thinking and Logical Processing -- Are Multi-modal Large Language Models Closing the Gap with Human Vision ? [5.076961098583674]
We introduce a novel adversarial dataset to provide evidence for the dual thinking framework in human vision.<n>Our psychophysical studies show the presence of multiple inferences in rapid succession.<n>Analysis of errors shows that the early stopping of visual processing can result in missing relevant information.
arXiv Detail & Related papers (2024-06-11T05:50:34Z) - Improving deep learning with prior knowledge and cognitive models: A
survey on enhancing explainability, adversarial robustness and zero-shot
learning [0.0]
We review current and emerging knowledge-informed and brain-inspired cognitive systems for realizing adversarial defenses.
Brain-inspired cognition methods use computational models that mimic the human mind to enhance intelligent behavior in artificial agents and autonomous robots.
arXiv Detail & Related papers (2024-03-11T18:11:00Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - A Survey on Brain-Inspired Deep Learning via Predictive Coding [85.93245078403875]
Predictive coding (PC) has shown promising performance in machine intelligence tasks.<n>PC can model information processing in different brain areas, can be used in cognitive control and robotics.
arXiv Detail & Related papers (2023-08-15T16:37:16Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - Memory-Augmented Theory of Mind Network [59.9781556714202]
Social reasoning requires the capacity of theory of mind (ToM) to contextualise and attribute mental states to others.
Recent machine learning approaches to ToM have demonstrated that we can train the observer to read the past and present behaviours of other agents.
We tackle the challenges by equipping the observer with novel neural memory mechanisms to encode, and hierarchical attention to selectively retrieve information about others.
This results in ToMMY, a theory of mind model that learns to reason while making little assumptions about the underlying mental processes.
arXiv Detail & Related papers (2023-01-17T14:48:58Z) - Learning to Complement Humans [67.38348247794949]
A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks.
We demonstrate how an end-to-end learning strategy can be harnessed to optimize the combined performance of human-machine teams.
arXiv Detail & Related papers (2020-05-01T20:00:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.