The role of object-centric representations, guided attention, and
external memory on generalizing visual relations
- URL: http://arxiv.org/abs/2304.07091v1
- Date: Fri, 14 Apr 2023 12:22:52 GMT
- Title: The role of object-centric representations, guided attention, and
external memory on generalizing visual relations
- Authors: Guillermo Puebla and Jeffrey S. Bowers
- Abstract summary: We evaluate a series of deep neural networks (DNNs) that integrate mechanism such as slot attention, recurrently guided attention, and external memory.
We find that, although some models performed better than others in generalizing the same-different relation to specific types of images, no model was able to generalize this relation across the board.
- Score: 0.6091702876917281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual reasoning is a long-term goal of vision research. In the last decade,
several works have attempted to apply deep neural networks (DNNs) to the task
of learning visual relations from images, with modest results in terms of the
generalization of the relations learned. In recent years, several innovations
in DNNs have been developed in order to enable learning abstract relation from
images. In this work, we systematically evaluate a series of DNNs that
integrate mechanism such as slot attention, recurrently guided attention, and
external memory, in the simplest possible visual reasoning task: deciding
whether two objects are the same or different. We found that, although some
models performed better than others in generalizing the same-different relation
to specific types of images, no model was able to generalize this relation
across the board. We conclude that abstract visual reasoning remains largely an
unresolved challenge for DNNs.
Related papers
- Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative
Cognition Approach [3.8073142980733]
Achieving visual reasoning is a long-term goal of artificial intelligence.
In recent years, object-centric representation learning has been put forward as a way to achieve visual reasoning.
We show that object-centric models are able to segregate the different objects in a scene, even in many out-of-distribution cases.
arXiv Detail & Related papers (2024-02-20T02:48:14Z) - OC-NMN: Object-centric Compositional Neural Module Network for
Generative Visual Analogical Reasoning [49.12350554270196]
We show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination.
Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language.
arXiv Detail & Related papers (2023-10-28T20:12:58Z) - Systematic Visual Reasoning through Object-Centric Relational
Abstraction [5.914610036560008]
We introduce OCRA, a model that extracts explicit representations of both objects and abstract relations.
It achieves strong systematic generalizations in tasks involving complex visual displays.
arXiv Detail & Related papers (2023-06-04T22:47:17Z) - Transferability of coVariance Neural Networks and Application to
Interpretable Brain Age Prediction using Anatomical Features [119.45320143101381]
Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks.
We have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs)
VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance over datasets whose covariance matrices converge to a limit object.
arXiv Detail & Related papers (2023-05-02T22:15:54Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - GAMR: A Guided Attention Model for (visual) Reasoning [7.919213739992465]
Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes.
We present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR)
GAMR posits that the brain solves complex visual reasoning problems dynamically via sequences of attention shifts to select and route task-relevant visual information into memory.
arXiv Detail & Related papers (2022-06-10T07:52:06Z) - DORA: Exploring Outlier Representations in Deep Neural Networks [0.0]
We present DORA, the first data-agnostic framework for analyzing the representational space of Deep Neural Networks (DNNs)
Central to our framework is the proposed Extreme-Activation (EA) distance measure, which assesses similarities between representations.
We validate the EA metric quantitatively, demonstrating its effectiveness both in controlled scenarios and real-world applications.
arXiv Detail & Related papers (2022-06-09T14:25:14Z) - Prune and distill: similar reformatting of image information along rat
visual cortex and deep neural networks [61.60177890353585]
Deep convolutional neural networks (CNNs) have been shown to provide excellent models for its functional analogue in the brain, the ventral stream in visual cortex.
Here we consider some prominent statistical patterns that are known to exist in the internal representations of either CNNs or the visual cortex.
We show that CNNs and visual cortex share a similarly tight relationship between dimensionality expansion/reduction of object representations and reformatting of image information.
arXiv Detail & Related papers (2022-05-27T08:06:40Z) - Understanding the computational demands underlying visual reasoning [10.308647202215708]
We systematically assess the ability of modern deep convolutional neural networks to learn to solve visual reasoning problems.
Our analysis leads to a novel taxonomy of visual reasoning tasks, which can be primarily explained by the type of relations and the number of relations used to compose the underlying rules.
arXiv Detail & Related papers (2021-08-08T10:46:53Z) - Constellation: Learning relational abstractions over objects for
compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z) - Visual Relationship Detection with Visual-Linguistic Knowledge from
Multimodal Representations [103.00383924074585]
Visual relationship detection aims to reason over relationships among salient objects in images.
We propose a novel approach named Visual-Linguistic Representations from Transformers (RVL-BERT)
RVL-BERT performs spatial reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training.
arXiv Detail & Related papers (2020-09-10T16:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.