Constellation: Learning relational abstractions over objects for
compositional imagination
- URL: http://arxiv.org/abs/2107.11153v1
- Date: Fri, 23 Jul 2021 11:59:40 GMT
- Title: Constellation: Learning relational abstractions over objects for
compositional imagination
- Authors: James C.R. Whittington, Rishabh Kabra, Loic Matthey, Christopher P.
Burgess, Alexander Lerchner
- Abstract summary: We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
- Score: 64.99658940906917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning structured representations of visual scenes is currently a major
bottleneck to bridging perception with reasoning. While there has been exciting
progress with slot-based models, which learn to segment scenes into sets of
objects, learning configurational properties of entire groups of objects is
still under-explored. To address this problem, we introduce Constellation, a
network that learns relational abstractions of static visual scenes, and
generalises these abstractions over sensory particularities, thus offering a
potential basis for abstract relational reasoning. We further show that this
basis, along with language association, provides a means to imagine sensory
content in new ways. This work is a first step in the explicit representation
of visual relationships and using them for complex cognitive procedures.
Related papers
- What Makes a Maze Look Like a Maze? [92.80800000328277]
We introduce Deep Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning.
At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols.
We show that DSG significantly improves the abstract visual reasoning performance of vision-language models.
arXiv Detail & Related papers (2024-09-12T16:41:47Z) - Emergence and Function of Abstract Representations in Self-Supervised
Transformers [0.0]
We study the inner workings of small-scale transformers trained to reconstruct partially masked visual scenes.
We show that the network develops intermediate abstract representations, or abstractions, that encode all semantic features of the dataset.
Using precise manipulation experiments, we demonstrate that abstractions are central to the network's decision-making process.
arXiv Detail & Related papers (2023-12-08T20:47:15Z) - Systematic Visual Reasoning through Object-Centric Relational
Abstraction [5.914610036560008]
We introduce OCRA, a model that extracts explicit representations of both objects and abstract relations.
It achieves strong systematic generalizations in tasks involving complex visual displays.
arXiv Detail & Related papers (2023-06-04T22:47:17Z) - Learning Structured Representations of Visual Scenes [1.6244541005112747]
We study how machines can describe the content of the individual image or video with visual relationships as the structured representations.
Specifically, we explore how structured representations of visual scenes can be effectively constructed and learned in both the static-image and video settings.
arXiv Detail & Related papers (2022-07-09T05:40:08Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - Compositional Scene Representation Learning via Reconstruction: A Survey [48.33349317481124]
Compositional scene representation learning is a task that enables such abilities.
Deep neural networks have been proven to be advantageous in representation learning.
Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation.
arXiv Detail & Related papers (2022-02-15T02:14:05Z) - Hierarchical Relational Inference [80.00374471991246]
We propose a novel approach to physical reasoning that models objects as hierarchies of parts that may locally behave separately, but also act more globally as a single whole.
Unlike prior approaches, our method learns in an unsupervised fashion directly from raw visual images.
It explicitly distinguishes multiple levels of abstraction and improves over a strong baseline at modeling synthetic and real-world videos.
arXiv Detail & Related papers (2020-10-07T20:19:10Z) - Object-Centric Learning with Slot Attention [43.684193749891506]
We present the Slot Attention module, an architectural component that interfaces with perceptual representations.
Slot Attention produces task-dependent abstract representations which we call slots.
We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions.
arXiv Detail & Related papers (2020-06-26T15:31:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.