PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
Reasoning
- URL: http://arxiv.org/abs/2112.05136v1
- Date: Thu, 9 Dec 2021 18:59:34 GMT
- Title: PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
Reasoning
- Authors: Yining Hong, Li Yi, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan
- Abstract summary: We introduce a new large-scale diagnostic visual reasoning dataset named PTR.
PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations.
We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes.
- Score: 135.2892665079159
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: A critical aspect of human visual perception is the ability to parse visual
scenes into individual objects and further into object parts, forming
part-whole hierarchies. Such composite structures could induce a rich set of
semantic concepts and relations, thus playing an important role in the
interpretation and organization of visual signals as well as for the
generalization of visual perception and reasoning. However, existing visual
reasoning benchmarks mostly focus on objects rather than parts. Visual
reasoning based on the full part-whole hierarchy is much more challenging than
object-centric reasoning due to finer-grained concepts, richer geometry
relations, and more complex physics. Therefore, to better serve for part-based
conceptual, relational and physical reasoning, we introduce a new large-scale
diagnostic visual reasoning dataset named PTR. PTR contains around 70k RGBD
synthetic images with ground truth object and part level annotations regarding
semantic instance segmentation, color attributes, spatial and geometric
relationships, and certain physical properties such as stability. These images
are paired with 700k machine-generated questions covering various types of
reasoning types, making them a good testbed for visual reasoning models. We
examine several state-of-the-art visual reasoning models on this dataset and
observe that they still make many surprising mistakes in situations where
humans can easily infer the correct answer. We believe this dataset will open
up new opportunities for part-based reasoning.
Related papers
- Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? [62.984473889987605]
We present a zero-shot framework for fine-grained visual concept learning by leveraging large language model and Visual Question Answering (VQA) system.
We pose these questions along with the query image to a VQA system and aggregate the answers to determine the presence or absence of an object in the test images.
Our experiments demonstrate comparable performance with existing zero-shot visual classification methods and few-shot concept learning approaches.
arXiv Detail & Related papers (2024-10-17T15:16:10Z) - Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning [0.7999703756441756]
Human capabilities in understanding visual relations are far superior to those of AI systems.
We develop a system equipped with a novel Glimpse-based Active Perception (GAP)
The results suggest that the GAP is essential for extracting visual relations that go beyond the immediate visual content.
arXiv Detail & Related papers (2024-09-30T11:48:11Z) - Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative
Cognition Approach [3.8073142980733]
Achieving visual reasoning is a long-term goal of artificial intelligence.
In recent years, object-centric representation learning has been put forward as a way to achieve visual reasoning.
We show that object-centric models are able to segregate the different objects in a scene, even in many out-of-distribution cases.
arXiv Detail & Related papers (2024-02-20T02:48:14Z) - Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know
How to Reason? [30.16956370267339]
We introduce a protocol to evaluate visual representations for the task of Visual Question Answering.
In order to decouple visual feature extraction from reasoning, we design a specific attention-based reasoning module.
We compare two types of visual representations, densely extracted local features and object-centric ones, against the performances of a perfect image representation using ground truth.
arXiv Detail & Related papers (2022-12-20T14:36:45Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - ComPhy: Compositional Physical Reasoning of Objects and Events from
Videos [113.2646904729092]
The compositionality between the visible and hidden properties poses unique challenges for AI models to reason from the physical world.
Existing studies on video reasoning mainly focus on visually observable elements such as object appearance, movement, and contact interaction.
We propose an oracle neural-symbolic framework named Compositional Physics Learner (CPL), combining visual perception, physical property learning, dynamic prediction, and symbolic execution.
arXiv Detail & Related papers (2022-05-02T17:59:13Z) - Constellation: Learning relational abstractions over objects for
compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z) - Object-Centric Diagnosis of Visual Reasoning [118.36750454795428]
This paper presents a systematical object-centric diagnosis of visual reasoning on grounding and robustness.
We develop a diagnostic model, namely Graph Reasoning Machine.
Our model replaces purely symbolic visual representation with probabilistic scene graph and then applies teacher-forcing training for the visual reasoning module.
arXiv Detail & Related papers (2020-12-21T18:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.