Abstract Visual Reasoning with Tangram Shapes
- URL: http://arxiv.org/abs/2211.16492v1
- Date: Tue, 29 Nov 2022 18:57:06 GMT
- Title: Abstract Visual Reasoning with Tangram Shapes
- Authors: Anya Ji and Noriyuki Kojima and Noah Rush and Alane Suhr and Wai Keen
Vong and Robert D. Hawkins and Yoav Artzi
- Abstract summary: KiloGram is a resource for studying abstract visual reasoning in humans and machines.
It is both visually and linguistically richer, moving beyond whole shape descriptions to include segmentation maps and part labels.
We use this resource to evaluate the abstract visual reasoning capacities of recent multi-modal models.
- Score: 16.51170712669011
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce KiloGram, a resource for studying abstract visual reasoning in
humans and machines. Drawing on the history of tangram puzzles as stimuli in
cognitive science, we build a richly annotated dataset that, with >1k distinct
stimuli, is orders of magnitude larger and more diverse than prior resources.
It is both visually and linguistically richer, moving beyond whole shape
descriptions to include segmentation maps and part labels. We use this resource
to evaluate the abstract visual reasoning capacities of recent multi-modal
models. We observe that pre-trained weights demonstrate limited abstract
reasoning, which dramatically improves with fine-tuning. We also observe that
explicitly describing parts aids abstract reasoning for both humans and models,
especially when jointly encoding the linguistic and visual inputs. KiloGram is
available at https://lil.nlp.cornell.edu/kilogram .
Related papers
- What Makes a Maze Look Like a Maze? [92.80800000328277]
We introduce Deep Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning.
At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols.
We show that DSG significantly improves the abstract visual reasoning performance of vision-language models.
arXiv Detail & Related papers (2024-09-12T16:41:47Z) - Neural Causal Abstractions [63.21695740637627]
We develop a new family of causal abstractions by clustering variables and their domains.
We show that such abstractions are learnable in practical settings through Neural Causal Models.
Our experiments support the theory and illustrate how to scale causal inferences to high-dimensional settings involving image data.
arXiv Detail & Related papers (2024-01-05T02:00:27Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - Discrete and continuous representations and processing in deep learning:
Looking forward [18.28761409764605]
We argue that combining discrete and continuous representations and their processing will be essential to build systems that exhibit a general form of intelligence.
We suggest and discuss several avenues that could improve current neural networks with the inclusion of discrete elements to combine the advantages of both types of representations.
arXiv Detail & Related papers (2022-01-04T16:30:18Z) - PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
Reasoning [135.2892665079159]
We introduce a new large-scale diagnostic visual reasoning dataset named PTR.
PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations.
We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes.
arXiv Detail & Related papers (2021-12-09T18:59:34Z) - Object-Centric Diagnosis of Visual Reasoning [118.36750454795428]
This paper presents a systematical object-centric diagnosis of visual reasoning on grounding and robustness.
We develop a diagnostic model, namely Graph Reasoning Machine.
Our model replaces purely symbolic visual representation with probabilistic scene graph and then applies teacher-forcing training for the visual reasoning module.
arXiv Detail & Related papers (2020-12-21T18:59:28Z) - Natural Language Rationales with Full-Stack Visual Reasoning: From
Pixels to Semantic Frames to Commonsense Graphs [106.15931418425906]
We present the first study focused on generating natural language rationales across several complex visual reasoning tasks.
We present RationaleVT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and visual commonsense graphs.
Our experiments show that the base pretrained language model benefits from visual adaptation and that free-text rationalization is a promising research direction to complement model interpretability for complex visual-textual reasoning tasks.
arXiv Detail & Related papers (2020-10-15T05:08:56Z) - Multi-Granularity Modularized Network for Abstract Visual Reasoning [15.956555435408557]
We focus on the Raven Progressive Matrices Test, designed to measure cognitive reasoning.
Inspired by cognitive studies, we propose a Multi-Granularity Modularized Network (MMoN) to bridge the gap between the processing of raw sensory information and symbolic reasoning.
arXiv Detail & Related papers (2020-07-09T09:54:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.