Compositional Scene Representation Learning via Reconstruction: A Survey
- URL: http://arxiv.org/abs/2202.07135v4
- Date: Wed, 14 Jun 2023 16:25:03 GMT
- Title: Compositional Scene Representation Learning via Reconstruction: A Survey
- Authors: Jinyang Yuan, Tonglin Chen, Bin Li, Xiangyang Xue
- Abstract summary: Compositional scene representation learning is a task that enables such abilities.
Deep neural networks have been proven to be advantageous in representation learning.
Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation.
- Score: 48.33349317481124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual scenes are composed of visual concepts and have the property of
combinatorial explosion. An important reason for humans to efficiently learn
from diverse visual scenes is the ability of compositional perception, and it
is desirable for artificial intelligence to have similar abilities.
Compositional scene representation learning is a task that enables such
abilities. In recent years, various methods have been proposed to apply deep
neural networks, which have been proven to be advantageous in representation
learning, to learn compositional scene representations via reconstruction,
advancing this research direction into the deep learning era. Learning via
reconstruction is advantageous because it may utilize massive unlabeled data
and avoid costly and laborious data annotation. In this survey, we first
outline the current progress on reconstruction-based compositional scene
representation learning with deep neural networks, including development
history and categorizations of existing methods from the perspectives of the
modeling of visual scenes and the inference of scene representations; then
provide benchmarks, including an open source toolbox to reproduce the benchmark
experiments, of representative methods that consider the most extensively
studied problem setting and form the foundation for other methods; and finally
discuss the limitations of existing methods and future directions of this
research topic.
Related papers
- Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes.
RHGNet introduces a top-down pathway that works in different ways in the training and inference processes.
Our model achieves SOTA performance on several commonly used datasets.
arXiv Detail & Related papers (2024-05-17T07:48:27Z) - Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors [49.99728312519117]
The aim of this work is to establish how accurately a recent semantic-based active perception model is able to complete visual tasks that are regularly performed by humans.
This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations.
In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model.
arXiv Detail & Related papers (2024-04-16T18:15:57Z) - Context-driven Visual Object Recognition based on Knowledge Graphs [0.8701566919381223]
We propose an approach that enhances deep learning methods by using external contextual knowledge encoded in a knowledge graph.
We conduct a series of experiments to investigate the impact of different contextual views on the learned object representations for the same image dataset.
arXiv Detail & Related papers (2022-10-20T13:09:00Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Learning Structured Representations of Visual Scenes [1.6244541005112747]
We study how machines can describe the content of the individual image or video with visual relationships as the structured representations.
Specifically, we explore how structured representations of visual scenes can be effectively constructed and learned in both the static-image and video settings.
arXiv Detail & Related papers (2022-07-09T05:40:08Z) - Constellation: Learning relational abstractions over objects for
compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z) - Knowledge-Guided Object Discovery with Acquired Deep Impressions [41.07379505694274]
We present a framework called Acquired Deep Impressions (ADI) which continuously learns knowledge of objects as "impressions"
ADI first acquires knowledge from scene images containing a single object in a supervised manner.
It then learns from novel multi-object scene images which may contain objects that have not been seen before.
arXiv Detail & Related papers (2021-03-19T03:17:57Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.