GENESIS-V2: Inferring Unordered Object Representations without Iterative
Refinement
- URL: http://arxiv.org/abs/2104.09958v2
- Date: Wed, 21 Apr 2021 14:52:11 GMT
- Title: GENESIS-V2: Inferring Unordered Object Representations without Iterative
Refinement
- Authors: Martin Engelcke, Oiwi Parker Jones, Ingmar Posner
- Abstract summary: We develop a new model, GENESIS-V2, which can infer a variable number of object representations without using RNNs or iterative refinement.
We show that GENESIS-V2 outperforms previous methods for unsupervised image segmentation and object-centric scene generation on established synthetic datasets.
- Score: 26.151968529063762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advances in object-centric generative models (OCGMs) have culminated in the
development of a broad range of methods for unsupervised object segmentation
and interpretable object-centric scene generation. These methods, however, are
limited to simulated and real-world datasets with limited visual complexity.
Moreover, object representations are often inferred using RNNs which do not
scale well to large images or iterative refinement which avoids imposing an
unnatural ordering on objects in an image but requires the a priori
initialisation of a fixed number of object representations. In contrast to
established paradigms, this work proposes an embedding-based approach in which
embeddings of pixels are clustered in a differentiable fashion using a
stochastic, non-parametric stick-breaking process. Similar to iterative
refinement, this clustering procedure also leads to randomly ordered object
representations, but without the need of initialising a fixed number of
clusters a priori. This is used to develop a new model, GENESIS-V2, which can
infer a variable number of object representations without using RNNs or
iterative refinement. We show that GENESIS-V2 outperforms previous methods for
unsupervised image segmentation and object-centric scene generation on
established synthetic datasets as well as more complex real-world datasets.
Related papers
- Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching [19.730504197461144]
We present a novel generalizable object pose estimation method to determine the object pose using only one RGB image.
Our method offers generalization to unseen objects without extensive training, operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.
arXiv Detail & Related papers (2024-11-24T14:31:50Z) - Segmenting objects with Bayesian fusion of active contour models and convnet priors [0.729597981661727]
We propose a novel instance segmentation method geared towards Natural Resource Monitoring (NRM) imagery.
We formulate the problem as Bayesian maximum a posteriori inference which, in learning the individual object contours, incorporates shape, location, and position priors.
In experiments, we tackle the challenging, real-world problem of segmenting individual dead tree crowns and precise contours.
arXiv Detail & Related papers (2024-10-09T20:36:43Z) - Object-centric architectures enable efficient causal representation
learning [51.6196391784561]
We show that when the observations are of multiple objects, the generative function is no longer injective and disentanglement fails in practice.
We develop an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties.
This approach is more data-efficient in the sense that it requires significantly fewer perturbations than a comparable approach that encodes to a Euclidean space.
arXiv Detail & Related papers (2023-10-29T16:01:03Z) - Neural Constraint Satisfaction: Hierarchical Abstraction for
Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities.
We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.
We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z) - Semantics-Aware Dynamic Localization and Refinement for Referring Image
Segmentation [102.25240608024063]
Referring image segments an image from a language expression.
We develop an algorithm that shifts from being localization-centric to segmentation-language.
Compared to its counterparts, our method is more versatile yet effective.
arXiv Detail & Related papers (2023-03-11T08:42:40Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Hybrid Generative Models for Two-Dimensional Datasets [5.206057210246861]
Two-dimensional array-based datasets are pervasive in a variety of domains.
Current approaches for generative modeling have typically been limited to conventional image datasets.
We propose a novel approach for generating two-dimensional datasets by moving the computations to the space of representation bases.
arXiv Detail & Related papers (2021-06-01T03:21:47Z) - CellSegmenter: unsupervised representation learning and instance
segmentation of modular images [0.0]
We introduce a structured deep generative model and an amortized inference framework for unsupervised representation learning and instance segmentation tasks.
The proposed inference algorithm is convolutional and parallelized, without any recurrent mechanisms.
We show segmentation results obtained for a cell nuclei imaging dataset, demonstrating the ability of our method to provide high-quality segmentations.
arXiv Detail & Related papers (2020-11-25T02:10:58Z) - Neural Star Domain as Primitive Representation [65.7313602687861]
We propose a novel primitive representation named neural star domain (NSD) that learns primitive shapes in the star domain.
NSD is a universal approximator of the star domain and is not only parsimonious and semantic but also an implicit and explicit shape representation.
We demonstrate that our approach outperforms existing methods in image reconstruction tasks, semantic capabilities, and speed and quality of sampling high-resolution meshes.
arXiv Detail & Related papers (2020-10-21T19:05:16Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.