Object-Centric Image Generation with Factored Depths, Locations, and
Appearances
- URL: http://arxiv.org/abs/2004.00642v1
- Date: Wed, 1 Apr 2020 18:00:11 GMT
- Title: Object-Centric Image Generation with Factored Depths, Locations, and
Appearances
- Authors: Titas Anciukevicius, Christoph H. Lampert, Paul Henderson
- Abstract summary: We present a generative model of images that explicitly reasons over the set of objects they show.
Our model learns a structured latent representation that separates objects from each other and from the background.
It can be trained from images alone in a purely unsupervised fashion without the need for object masks or depth information.
- Score: 30.541425619507184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a generative model of images that explicitly reasons over the set
of objects they show. Our model learns a structured latent representation that
separates objects from each other and from the background; unlike prior works,
it explicitly represents the 2D position and depth of each object, as well as
an embedding of its segmentation mask and appearance. The model can be trained
from images alone in a purely unsupervised fashion without the need for object
masks or depth information. Moreover, it always generates complete objects,
even though a significant fraction of training images contain occlusions.
Finally, we show that our model can infer decompositions of novel images into
their constituent objects, including accurate prediction of depth ordering and
segmentation of occluded parts.
Related papers
- SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects [20.978091381109294]
We propose a method to generate articulated objects from a single image.
Our method generates an articulated object that is visually consistent with the input image.
Our experiments show that our method outperforms the state-of-the-art in articulated object creation.
arXiv Detail & Related papers (2024-10-21T20:41:32Z) - Thinking Outside the BBox: Unconstrained Generative Object Compositing [36.86960274923344]
We present a novel problem of unconstrained generative object compositing.
Our first-of-its-kind model is able to generate object effects such as shadows and reflections that go beyond the mask.
Our model outperforms existing object placement and compositing models in various quality metrics and user studies.
arXiv Detail & Related papers (2024-09-06T18:42:30Z) - Understanding Self-Supervised Pretraining with Part-Aware Representation
Learning [88.45460880824376]
We study the capability that self-supervised representation pretraining methods learn part-aware representations.
Results show that the fully-supervised model outperforms self-supervised models for object-level recognition.
arXiv Detail & Related papers (2023-01-27T18:58:42Z) - Compositional Scene Modeling with Global Object-Centric Representations [44.43366905943199]
Humans can easily identify the same object, even if occlusions exist, by completing the occluded parts based on its canonical image in the memory.
This paper proposes a compositional scene modeling method to infer global representations of canonical images of objects without any supervision.
arXiv Detail & Related papers (2022-11-21T14:36:36Z) - Unsupervised Object Learning via Common Fate [61.14802390241075]
Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling.
We decompose this problem into three easier subtasks, and provide candidate solutions for each of them.
We show that our approach allows learning generative models that generalize beyond the occlusions present in the input videos.
arXiv Detail & Related papers (2021-10-13T08:22:04Z) - Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition [57.088328223220934]
Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world.
In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.
arXiv Detail & Related papers (2021-04-12T11:37:23Z) - Shelf-Supervised Mesh Prediction in the Wild [54.01373263260449]
We propose a learning-based approach to infer 3D shape and pose of object from a single image.
We first infer a volumetric representation in a canonical frame, along with the camera pose.
The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame.
arXiv Detail & Related papers (2021-02-11T18:57:10Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.