Learning to reason over visual objects
- URL: http://arxiv.org/abs/2303.02260v2
- Date: Thu, 26 Oct 2023 21:24:47 GMT
- Title: Learning to reason over visual objects
- Authors: Shanka Subhra Mondal, Taylor Webb, Jonathan D. Cohen
- Abstract summary: We investigate the extent to which a general-purpose mechanism for processing visual scenes in terms of objects might help promote abstract visual reasoning.
We find that an inductive bias for object-centric processing may be a key component of abstract visual reasoning.
- Score: 6.835410768769661
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A core component of human intelligence is the ability to identify abstract
patterns inherent in complex, high-dimensional perceptual data, as exemplified
by visual reasoning tasks such as Raven's Progressive Matrices (RPM). Motivated
by the goal of designing AI systems with this capacity, recent work has focused
on evaluating whether neural networks can learn to solve RPM-like problems.
Previous work has generally found that strong performance on these problems
requires the incorporation of inductive biases that are specific to the RPM
problem format, raising the question of whether such models might be more
broadly useful. Here, we investigated the extent to which a general-purpose
mechanism for processing visual scenes in terms of objects might help promote
abstract visual reasoning. We found that a simple model, consisting only of an
object-centric encoder and a transformer reasoning module, achieved
state-of-the-art results on both of two challenging RPM-like benchmarks (PGM
and I-RAVEN), as well as a novel benchmark with greater visual complexity
(CLEVR-Matrices). These results suggest that an inductive bias for
object-centric processing may be a key component of abstract visual reasoning,
obviating the need for problem-specific inductive biases.
Related papers
- Solving the Clustering Reasoning Problems by Modeling a Deep-Learning-Based Probabilistic Model [1.7955614278088239]
We introduce PMoC, a deep-learning-based probabilistic model, achieving high reasoning accuracy in the Bongard-Logo.
As a bonus, we also designed Pose-Transformer for complex visual abstract reasoning tasks.
arXiv Detail & Related papers (2024-03-05T18:08:29Z) - Towards Generative Abstract Reasoning: Completing Raven's Progressive Matrix via Rule Abstraction and Selection [52.107043437362556]
Raven's Progressive Matrix (RPM) is widely used to probe abstract visual reasoning in machine intelligence.
Participators of RPM tests can show powerful reasoning ability by inferring and combining attribute-changing rules.
We propose a deep latent variable model for answer generation problems through Rule AbstractIon and SElection.
arXiv Detail & Related papers (2024-01-18T13:28:44Z) - Learning Abstract Visual Reasoning via Task Decomposition: A Case Study
in Raven Progressive Matrices [0.24475591916185496]
In Raven Progressive Matrices, the task is to choose one of the available answers given a context.
In this study, we propose a deep learning architecture based on the transformer blueprint.
The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer.
arXiv Detail & Related papers (2023-08-12T11:02:21Z) - Systematic Visual Reasoning through Object-Centric Relational
Abstraction [5.914610036560008]
We introduce OCRA, a model that extracts explicit representations of both objects and abstract relations.
It achieves strong systematic generalizations in tasks involving complex visual displays.
arXiv Detail & Related papers (2023-06-04T22:47:17Z) - Rotating Features for Object Discovery [74.1465486264609]
We present Rotating Features, a generalization of complex-valued features to higher dimensions, and a new evaluation procedure for extracting objects from distributed representations.
Together, these advancements enable us to scale distributed object-centric representations from simple toy to real-world data.
arXiv Detail & Related papers (2023-06-01T12:16:26Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - One-shot Visual Reasoning on RPMs with an Application to Video Frame
Prediction [1.0932251830449902]
Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability.
We propose a One-shot Human-Understandable ReaSoner (Os-HURS) to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks.
arXiv Detail & Related papers (2021-11-24T06:51:38Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - DAReN: A Collaborative Approach Towards Reasoning And Disentangling [27.50150027974947]
We propose an end-to-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together.
We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM.
arXiv Detail & Related papers (2021-09-27T16:10:30Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - Machine Number Sense: A Dataset of Visual Arithmetic Problems for
Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG)
These visual arithmetic problems are in the form of geometric figures.
We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.