Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative
Cognition Approach
- URL: http://arxiv.org/abs/2402.12675v1
- Date: Tue, 20 Feb 2024 02:48:14 GMT
- Title: Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative
Cognition Approach
- Authors: Guillermo Puebla and Jeffrey S. Bowers
- Abstract summary: Achieving visual reasoning is a long-term goal of artificial intelligence.
In recent years, object-centric representation learning has been put forward as a way to achieve visual reasoning.
We show that object-centric models are able to segregate the different objects in a scene, even in many out-of-distribution cases.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Achieving visual reasoning is a long-term goal of artificial intelligence. In
the last decade, several studies have applied deep neural networks (DNNs) to
the task of learning visual relations from images, with modest results in terms
of generalization of the relations learned. However, in recent years,
object-centric representation learning has been put forward as a way to achieve
visual reasoning within the deep learning framework. Object-centric models
attempt to model input scenes as compositions of objects and relations between
them. To this end, these models use several kinds of attention mechanisms to
segregate the individual objects in a scene from the background and from other
objects. In this work we tested relation learning and generalization in several
object-centric models, as well as a ResNet-50 baseline. In contrast to previous
research, which has focused heavily in the same-different task in order to
asses relational reasoning in DNNs, we use a set of tasks -- with varying
degrees of difficulty -- derived from the comparative cognition literature. Our
results show that object-centric models are able to segregate the different
objects in a scene, even in many out-of-distribution cases. In our simpler
tasks, this improves their capacity to learn and generalize visual relations in
comparison to the ResNet-50 baseline. However, object-centric models still
struggle in our more difficult tasks and conditions. We conclude that abstract
visual reasoning remains an open challenge for DNNs, including object-centric
models.
Related papers
- OC-NMN: Object-centric Compositional Neural Module Network for
Generative Visual Analogical Reasoning [49.12350554270196]
We show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination.
Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language.
arXiv Detail & Related papers (2023-10-28T20:12:58Z) - Systematic Visual Reasoning through Object-Centric Relational
Abstraction [5.914610036560008]
We introduce OCRA, a model that extracts explicit representations of both objects and abstract relations.
It achieves strong systematic generalizations in tasks involving complex visual displays.
arXiv Detail & Related papers (2023-06-04T22:47:17Z) - The role of object-centric representations, guided attention, and
external memory on generalizing visual relations [0.6091702876917281]
We evaluate a series of deep neural networks (DNNs) that integrate mechanism such as slot attention, recurrently guided attention, and external memory.
We find that, although some models performed better than others in generalizing the same-different relation to specific types of images, no model was able to generalize this relation across the board.
arXiv Detail & Related papers (2023-04-14T12:22:52Z) - Deep Non-Monotonic Reasoning for Visual Abstract Reasoning Tasks [3.486683381782259]
This paper proposes a non-monotonic computational approach to solve visual abstract reasoning tasks.
We implement a deep learning model using this approach and tested it on the RAVEN dataset -- a dataset inspired by the Raven's Progressive Matrices test.
arXiv Detail & Related papers (2023-02-08T16:35:05Z) - Sparse Relational Reasoning with Object-Centric Representations [78.83747601814669]
We investigate the composability of soft-rules learned by relational neural architectures when operating over object-centric representations.
We find that increasing sparsity, especially on features, improves the performance of some models and leads to simpler relations.
arXiv Detail & Related papers (2022-07-15T14:57:33Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Bi-directional Object-context Prioritization Learning for Saliency
Ranking [60.62461793691836]
Existing approaches focus on learning either object-object or object-scene relations.
We observe that spatial attention works concurrently with object-based attention in the human visual recognition system.
We propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking.
arXiv Detail & Related papers (2022-03-17T16:16:03Z) - PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
Reasoning [135.2892665079159]
We introduce a new large-scale diagnostic visual reasoning dataset named PTR.
PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations.
We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes.
arXiv Detail & Related papers (2021-12-09T18:59:34Z) - Causal Navigation by Continuous-time Neural Networks [108.84958284162857]
We propose a theoretical and experimental framework for learning causal representations using continuous-time neural networks.
We evaluate our method in the context of visual-control learning of drones over a series of complex tasks.
arXiv Detail & Related papers (2021-06-15T17:45:32Z) - Visual Relationship Detection with Visual-Linguistic Knowledge from
Multimodal Representations [103.00383924074585]
Visual relationship detection aims to reason over relationships among salient objects in images.
We propose a novel approach named Visual-Linguistic Representations from Transformers (RVL-BERT)
RVL-BERT performs spatial reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training.
arXiv Detail & Related papers (2020-09-10T16:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.