Related papers: GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement

GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement

URL: http://arxiv.org/abs/2104.09958v2
Date: Wed, 21 Apr 2021 14:52:11 GMT
Title: GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement
Authors: Martin Engelcke, Oiwi Parker Jones, Ingmar Posner
Abstract summary: We develop a new model, GENESIS-V2, which can infer a variable number of object representations without using RNNs or iterative refinement. We show that GENESIS-V2 outperforms previous methods for unsupervised image segmentation and object-centric scene generation on established synthetic datasets.
Score: 26.151968529063762
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advances in object-centric generative models (OCGMs) have culminated in the development of a broad range of methods for unsupervised object segmentation and interpretable object-centric scene generation. These methods, however, are limited to simulated and real-world datasets with limited visual complexity. Moreover, object representations are often inferred using RNNs which do not scale well to large images or iterative refinement which avoids imposing an unnatural ordering on objects in an image but requires the a priori initialisation of a fixed number of object representations. In contrast to established paradigms, this work proposes an embedding-based approach in which embeddings of pixels are clustered in a differentiable fashion using a stochastic, non-parametric stick-breaking process. Similar to iterative refinement, this clustering procedure also leads to randomly ordered object representations, but without the need of initialising a fixed number of clusters a priori. This is used to develop a new model, GENESIS-V2, which can infer a variable number of object representations without using RNNs or iterative refinement. We show that GENESIS-V2 outperforms previous methods for unsupervised image segmentation and object-centric scene generation on established synthetic datasets as well as more complex real-world datasets.

Related papers

Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching [19.730504197461144]
We present a novel generalizable object pose estimation method to determine the object pose using only one RGB image. Our method offers generalization to unseen objects without extensive training, operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.
arXiv Detail & Related papers (2024-11-24T14:31:50Z)
Segmenting objects with Bayesian fusion of active contour models and convnet priors [0.729597981661727]
We propose a novel instance segmentation method geared towards Natural Resource Monitoring (NRM) imagery. We formulate the problem as Bayesian maximum a posteriori inference which, in learning the individual object contours, incorporates shape, location, and position priors. In experiments, we tackle the challenging, real-world problem of segmenting individual dead tree crowns and precise contours.
arXiv Detail & Related papers (2024-10-09T20:36:43Z)
Object-centric architectures enable efficient causal representation learning [51.6196391784561]
We show that when the observations are of multiple objects, the generative function is no longer injective and disentanglement fails in practice. We develop an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties. This approach is more data-efficient in the sense that it requires significantly fewer perturbations than a comparable approach that encodes to a Euclidean space.
arXiv Detail & Related papers (2023-10-29T16:01:03Z)
Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities. We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z)
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation [102.25240608024063]
Referring image segments an image from a language expression. We develop an algorithm that shifts from being localization-centric to segmentation-language. Compared to its counterparts, our method is more versatile yet effective.
arXiv Detail & Related papers (2023-03-11T08:42:40Z)
Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder. We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets. We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z)
Hybrid Generative Models for Two-Dimensional Datasets [5.206057210246861]
Two-dimensional array-based datasets are pervasive in a variety of domains. Current approaches for generative modeling have typically been limited to conventional image datasets. We propose a novel approach for generating two-dimensional datasets by moving the computations to the space of representation bases.
arXiv Detail & Related papers (2021-06-01T03:21:47Z)
CellSegmenter: unsupervised representation learning and instance segmentation of modular images [0.0]
We introduce a structured deep generative model and an amortized inference framework for unsupervised representation learning and instance segmentation tasks. The proposed inference algorithm is convolutional and parallelized, without any recurrent mechanisms. We show segmentation results obtained for a cell nuclei imaging dataset, demonstrating the ability of our method to provide high-quality segmentations.
arXiv Detail & Related papers (2020-11-25T02:10:58Z)
Neural Star Domain as Primitive Representation [65.7313602687861]
We propose a novel primitive representation named neural star domain (NSD) that learns primitive shapes in the star domain. NSD is a universal approximator of the star domain and is not only parsimonious and semantic but also an implicit and explicit shape representation. We demonstrate that our approach outperforms existing methods in image reconstruction tasks, semantic capabilities, and speed and quality of sampling high-resolution meshes.
arXiv Detail & Related papers (2020-10-21T19:05:16Z)
Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects. Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.