Slot Abstractors: Toward Scalable Abstract Visual Reasoning
- URL: http://arxiv.org/abs/2403.03458v2
- Date: Sun, 2 Jun 2024 23:04:43 GMT
- Title: Slot Abstractors: Toward Scalable Abstract Visual Reasoning
- Authors: Shanka Subhra Mondal, Jonathan D. Cohen, Taylor W. Webb,
- Abstract summary: We propose Slot Abstractors, an approach to abstract visual reasoning that can be scaled to problems involving a large number of objects and multiple relations among them.
The approach displays state-of-the-art performance across four abstract visual reasoning tasks, as well as an abstract reasoning task involving real-world images.
- Score: 5.262577780347204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Abstract visual reasoning is a characteristically human ability, allowing the identification of relational patterns that are abstracted away from object features, and the systematic generalization of those patterns to unseen problems. Recent work has demonstrated strong systematic generalization in visual reasoning tasks involving multi-object inputs, through the integration of slot-based methods used for extracting object-centric representations coupled with strong inductive biases for relational abstraction. However, this approach was limited to problems containing a single rule, and was not scalable to visual reasoning problems containing a large number of objects. Other recent work proposed Abstractors, an extension of Transformers that incorporates strong relational inductive biases, thereby inheriting the Transformer's scalability and multi-head architecture, but it has yet to be demonstrated how this approach might be applied to multi-object visual inputs. Here we combine the strengths of the above approaches and propose Slot Abstractors, an approach to abstract visual reasoning that can be scaled to problems involving a large number of objects and multiple relations among them. The approach displays state-of-the-art performance across four abstract visual reasoning tasks, as well as an abstract reasoning task involving real-world images.
Related papers
- Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem [37.27516441519387]
We show that state-of-the-art vision language models exhibit surprising failures on basic multi-object reasoning tasks that humans perform with near perfect accuracy.
We find that many of the puzzling failures of state-of-the-art VLMs can be explained as arising due to the binding problem, and that these failure modes are strikingly similar to the limitations exhibited by rapid, feedforward processing in the human brain.
arXiv Detail & Related papers (2024-10-31T22:24:47Z) - Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.6663322930814]
We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks.
We propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture.
Our experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance.
arXiv Detail & Related papers (2024-04-24T17:59:48Z) - Neural Causal Abstractions [63.21695740637627]
We develop a new family of causal abstractions by clustering variables and their domains.
We show that such abstractions are learnable in practical settings through Neural Causal Models.
Our experiments support the theory and illustrate how to scale causal inferences to high-dimensional settings involving image data.
arXiv Detail & Related papers (2024-01-05T02:00:27Z) - Learning Differentiable Logic Programs for Abstract Visual Reasoning [18.82429807065658]
Differentiable forward reasoning has been developed to integrate reasoning with gradient-based machine learning paradigms.
NEUMANN is a graph-based differentiable forward reasoner, passing messages in a memory-efficient manner and handling structured programs with functors.
We demonstrate that NEUMANN solves visual reasoning tasks efficiently, outperforming neural, symbolic, and neuro-symbolic baselines.
arXiv Detail & Related papers (2023-07-03T11:02:40Z) - Systematic Visual Reasoning through Object-Centric Relational
Abstraction [5.914610036560008]
We introduce OCRA, a model that extracts explicit representations of both objects and abstract relations.
It achieves strong systematic generalizations in tasks involving complex visual displays.
arXiv Detail & Related papers (2023-06-04T22:47:17Z) - Neural Constraint Satisfaction: Hierarchical Abstraction for
Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities.
We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.
We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z) - Towards Preserving Semantic Structure in Argumentative Multi-Agent via
Abstract Interpretation [0.0]
We investigate the notion of abstraction from the model-checking perspective.
Several arguments are trying to defend the same position from various points of view, thereby reducing the size of the argumentation framework.
arXiv Detail & Related papers (2022-11-28T21:32:52Z) - Causal Reasoning Meets Visual Representation Learning: A Prospective
Study [117.08431221482638]
Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models.
Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms.
This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
arXiv Detail & Related papers (2022-04-26T02:22:28Z) - Constellation: Learning relational abstractions over objects for
compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z) - Hierarchical Relational Inference [80.00374471991246]
We propose a novel approach to physical reasoning that models objects as hierarchies of parts that may locally behave separately, but also act more globally as a single whole.
Unlike prior approaches, our method learns in an unsupervised fashion directly from raw visual images.
It explicitly distinguishes multiple levels of abstraction and improves over a strong baseline at modeling synthetic and real-world videos.
arXiv Detail & Related papers (2020-10-07T20:19:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.