Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know
How to Reason?
- URL: http://arxiv.org/abs/2212.10292v1
- Date: Tue, 20 Dec 2022 14:36:45 GMT
- Title: Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know
How to Reason?
- Authors: Monika Wysocza\'nska, Tom Monnier, Tomasz Trzci\'nski, David Picard
- Abstract summary: We introduce a protocol to evaluate visual representations for the task of Visual Question Answering.
In order to decouple visual feature extraction from reasoning, we design a specific attention-based reasoning module.
We compare two types of visual representations, densely extracted local features and object-centric ones, against the performances of a perfect image representation using ground truth.
- Score: 30.16956370267339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in visual representation learning allowed to build an
abundance of powerful off-the-shelf features that are ready-to-use for numerous
downstream tasks. This work aims to assess how well these features preserve
information about the objects, such as their spatial location, their visual
properties and their relative relationships. We propose to do so by evaluating
them in the context of visual reasoning, where multiple objects with complex
relationships and different attributes are at play. More specifically, we
introduce a protocol to evaluate visual representations for the task of Visual
Question Answering. In order to decouple visual feature extraction from
reasoning, we design a specific attention-based reasoning module which is
trained on the frozen visual representations to be evaluated, in a spirit
similar to standard feature evaluations relying on shallow networks. We compare
two types of visual representations, densely extracted local features and
object-centric ones, against the performances of a perfect image representation
using ground truth. Our main findings are two-fold. First, despite excellent
performances on classical proxy tasks, such representations fall short for
solving complex reasoning problem. Second, object-centric features better
preserve the critical information necessary to perform visual reasoning. In our
proposed framework we show how to methodologically approach this evaluation.
Related papers
- Take A Step Back: Rethinking the Two Stages in Visual Reasoning [57.16394309170051]
This paper revisits visual reasoning with a two-stage perspective.
It is more efficient to implement symbolization via separated encoders for different data domains while using a shared reasoner.
The proposed two-stage framework achieves impressive generalization ability on various visual reasoning tasks.
arXiv Detail & Related papers (2024-07-29T02:56:19Z) - Towards A Unified Neural Architecture for Visual Recognition and
Reasoning [40.938279131241764]
We propose a unified neural architecture for visual recognition and reasoning with a generic interface (e.g., tokens) for both.
Our framework enables the investigation of how different visual recognition tasks, datasets, and inductive biases can help enable principledtemporal reasoning capabilities.
arXiv Detail & Related papers (2023-11-10T20:27:43Z) - Exploring Predicate Visual Context in Detecting Human-Object
Interactions [44.937383506126274]
We study how best to re-introduce image features via cross-attention.
Our model with enhanced predicate visual context (PViC) outperforms state-of-the-art methods on the HICO-DET and V-COCO benchmarks.
arXiv Detail & Related papers (2023-08-11T15:57:45Z) - Does Visual Pretraining Help End-to-End Reasoning? [81.4707017038019]
We investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks.
We propose a simple and general self-supervised framework which "compresses" each video frame into a small set of tokens.
We observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning.
arXiv Detail & Related papers (2023-07-17T14:08:38Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
Reasoning [135.2892665079159]
We introduce a new large-scale diagnostic visual reasoning dataset named PTR.
PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations.
We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes.
arXiv Detail & Related papers (2021-12-09T18:59:34Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z) - Dynamic Language Binding in Relational Visual Reasoning [67.85579756590478]
We present Language-binding Object Graph Network, the first neural reasoning method with dynamic relational structures across both visual and textual domains.
Our method outperforms other methods in sophisticated question-answering tasks wherein multiple object relations are involved.
arXiv Detail & Related papers (2020-04-30T06:26:20Z) - SHOP-VRB: A Visual Reasoning Benchmark for Object Perception [26.422761228628698]
We present an approach and a benchmark for visual reasoning in robotics applications.
We focus on inferring object properties from visual and text data.
We propose a reasoning system based on symbolic program execution.
arXiv Detail & Related papers (2020-04-06T13:46:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.