Learning Abstract Visual Reasoning via Task Decomposition: A Case Study
in Raven Progressive Matrices
- URL: http://arxiv.org/abs/2308.06528v2
- Date: Thu, 7 Mar 2024 18:17:02 GMT
- Title: Learning Abstract Visual Reasoning via Task Decomposition: A Case Study
in Raven Progressive Matrices
- Authors: Jakub Kwiatkowski and Krzysztof Krawiec
- Abstract summary: In Raven Progressive Matrices, the task is to choose one of the available answers given a context.
In this study, we propose a deep learning architecture based on the transformer blueprint.
The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer.
- Score: 0.24475591916185496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning to perform abstract reasoning often requires decomposing the task in
question into intermediate subgoals that are not specified upfront, but need to
be autonomously devised by the learner. In Raven Progressive Matrices (RPM),
the task is to choose one of the available answers given a context, where both
the context and answers are composite images featuring multiple objects in
various spatial arrangements. As this high-level goal is the only guidance
available, learning to solve RPMs is challenging. In this study, we propose a
deep learning architecture based on the transformer blueprint which, rather
than directly making the above choice, addresses the subgoal of predicting the
visual properties of individual objects and their arrangements. The
multidimensional predictions obtained in this way are then directly juxtaposed
to choose the answer. We consider a few ways in which the model parses the
visual input into tokens and several regimes of masking parts of the input in
self-supervised training. In experimental assessment, the models not only
outperform state-of-the-art methods but also provide interesting insights and
partial explanations about the inference. The design of the method also makes
it immune to biases that are known to be present in some RPM benchmarks.
Related papers
- Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules
in Vector-symbolic Architectures [22.12114509953737]
Abstract reasoning is a cornerstone of human intelligence, and replicating it with artificial intelligence (AI) presents an ongoing challenge.
This study focuses on efficiently solving Raven's progressive matrices (RPM), a visual test for assessing abstract reasoning abilities.
Instead of hard-coding the rule formulations associated with RPMs, our approach can learn the VSA rule formulations with just one pass through the training data.
arXiv Detail & Related papers (2024-01-29T10:17:18Z) - Tackling the Abstraction and Reasoning Corpus (ARC) with Object-centric
Models and the MDL Principle [0.0]
We introduce object-centric models that are in line with the natural programs produced by humans.
Our models can not only perform predictions, but also provide joint descriptions for input/output pairs.
A diverse range of tasks are solved, and the learned models are similar to the natural programs.
arXiv Detail & Related papers (2023-11-01T14:25:51Z) - A Study of Forward-Forward Algorithm for Self-Supervised Learning [65.268245109828]
We study the performance of forward-forward vs. backpropagation for self-supervised representation learning.
Our main finding is that while the forward-forward algorithm performs comparably to backpropagation during (self-supervised) training, the transfer performance is significantly lagging behind in all the studied settings.
arXiv Detail & Related papers (2023-09-21T10:14:53Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Learning to reason over visual objects [6.835410768769661]
We investigate the extent to which a general-purpose mechanism for processing visual scenes in terms of objects might help promote abstract visual reasoning.
We find that an inductive bias for object-centric processing may be a key component of abstract visual reasoning.
arXiv Detail & Related papers (2023-03-03T23:19:42Z) - Deep Non-Monotonic Reasoning for Visual Abstract Reasoning Tasks [3.486683381782259]
This paper proposes a non-monotonic computational approach to solve visual abstract reasoning tasks.
We implement a deep learning model using this approach and tested it on the RAVEN dataset -- a dataset inspired by the Raven's Progressive Matrices test.
arXiv Detail & Related papers (2023-02-08T16:35:05Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Exploring Target Representations for Masked Autoencoders [78.57196600585462]
We show that a careful choice of the target representation is unnecessary for learning good representations.
We propose a multi-stage masked distillation pipeline and use a randomly model as the teacher.
A proposed method to perform masked knowledge distillation with bootstrapped teachers (dBOT) outperforms previous self-supervised methods by nontrivial margins.
arXiv Detail & Related papers (2022-09-08T16:55:19Z) - Raven's Progressive Matrices Completion with Latent Gaussian Process
Priors [42.310737373877714]
Raven's Progressive Matrices (RPM) are widely used in human IQ tests.
We propose a deep latent variable model, in which multiple Gaussian processes are employed as priors of latent variables.
We evaluate the proposed model on RPM-like datasets with multiple continuously-changing visual concepts.
arXiv Detail & Related papers (2021-03-22T17:48:44Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z) - Text Modular Networks: Learning to Decompose Tasks in the Language of
Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.