Unsupervised Layered Image Decomposition into Object Prototypes
- URL: http://arxiv.org/abs/2104.14575v1
- Date: Thu, 29 Apr 2021 18:02:01 GMT
- Title: Unsupervised Layered Image Decomposition into Object Prototypes
- Authors: Tom Monnier, Elliot Vincent, Jean Ponce, Mathieu Aubry
- Abstract summary: We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models.
We first validate our approach by providing results on par with the state of the art on standard multi-object synthetic benchmarks.
We then demonstrate the applicability of our model to real images in tasks that include clustering (SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from unfiltered social network images.
- Score: 39.20333694585477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an unsupervised learning framework for decomposing images into
layers of automatically discovered object models. Contrary to recent approaches
that model image layers with autoencoder networks, we represent them as
explicit transformations of a small set of prototypical images. Our model has
three main components: (i) a set of object prototypes in the form of learnable
images with a transparency channel, which we refer to as sprites; (ii)
differentiable parametric functions predicting occlusions and transformation
parameters necessary to instantiate the sprites in a given image; (iii) a
layered image formation model with occlusion for compositing these instances
into complete images including background. By jointly learning the sprites and
occlusion/transformation predictors to reconstruct images, our approach not
only yields accurate layered image decompositions, but also identifies object
categories and instance parameters. We first validate our approach by providing
results on par with the state of the art on standard multi-object synthetic
benchmarks (Tetrominoes, Multi-dSprites, CLEVR6). We then demonstrate the
applicability of our model to real images in tasks that include clustering
(SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from
unfiltered social network images. To the best of our knowledge, our approach is
the first layered image decomposition algorithm that learns an explicit and
shared concept of object type, and is robust enough to be applied to real
images.
Related papers
- Variable Radiance Field for Real-Life Category-Specifc Reconstruction
from Single Image [27.290232027686237]
We present a novel framework that can reconstruct category-specific objects from a single image without known camera parameters.
We parameterize the geometry and appearance of the object using a multi-scale global feature extractor.
We also propose a contrastive learning-based pretraining strategy to improve the feature extractor.
arXiv Detail & Related papers (2023-06-08T12:12:02Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance
Consistency [59.427074701985795]
Single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry.
We avoid all of these supervisions and hypotheses by leveraging explicitly the consistency between images of different object instances.
Our main contributions are two approaches to leverage cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; (ii) swap reconstruction, a loss enforcing consistency between instances having similar shape or texture.
arXiv Detail & Related papers (2022-04-21T17:47:35Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Learning Generative Models of Textured 3D Meshes from Real-World Images [26.353307246909417]
We propose a GAN framework for generating textured triangle meshes without relying on such annotations.
We show that the performance of our approach is on par with prior work that relies on ground-truth keypoints.
arXiv Detail & Related papers (2021-03-29T14:07:37Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.