Self-Supervised 2D Image to 3D Shape Translation with Disentangled
Representations
- URL: http://arxiv.org/abs/2003.10016v2
- Date: Fri, 29 Jan 2021 22:55:47 GMT
- Title: Self-Supervised 2D Image to 3D Shape Translation with Disentangled
Representations
- Authors: Berk Kaya, Radu Timofte
- Abstract summary: We present a framework to translate between 2D image views and 3D object shapes.
We propose SIST, a Self-supervised Image to Shape Translation framework.
- Score: 92.89846887298852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a framework to translate between 2D image views and 3D object
shapes. Recent progress in deep learning enabled us to learn structure-aware
representations from a scene. However, the existing literature assumes that
pairs of images and 3D shapes are available for training in full supervision.
In this paper, we propose SIST, a Self-supervised Image to Shape Translation
framework that fulfills three tasks: (i) reconstructing the 3D shape from a
single image; (ii) learning disentangled representations for shape, appearance
and viewpoint; and (iii) generating a realistic RGB image from these
independent factors. In contrast to the existing approaches, our method does
not require image-shape pairs for training. Instead, it uses unpaired image and
shape datasets from the same object class and jointly trains image generator
and shape reconstruction networks. Our translation method achieves promising
results, comparable in quantitative and qualitative terms to the
state-of-the-art achieved by fully-supervised methods.
Related papers
- Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text
Aligned Latent Representation [47.945556996219295]
We present a novel alignment-before-generation approach to generate 3D shapes based on 2D images or texts.
Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM)
arXiv Detail & Related papers (2023-06-29T17:17:57Z) - Self-Supervised Image Representation Learning with Geometric Set
Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z) - 3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow [61.62796058294777]
Reconstructing 3D shape from a single 2D image is a challenging task.
Most of the previous methods still struggle to extract semantic attributes for 3D reconstruction task.
We propose 3DAttriFlow to disentangle and extract semantic attributes through different semantic levels in the input images.
arXiv Detail & Related papers (2022-03-29T02:03:31Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - Fully Understanding Generic Objects: Modeling, Segmentation, and
Reconstruction [33.95791350070165]
Inferring 3D structure of a generic object from a 2D image is a long-standing objective of computer vision.
We take an alternative approach with semi-supervised learning. That is, for a 2D image of a generic object, we decompose it into latent representations of category, shape and albedo.
We show that the complete shape and albedo modeling enables us to leverage real 2D images in both modeling and model fitting.
arXiv Detail & Related papers (2021-04-02T02:39:29Z) - Cycle-Consistent Generative Rendering for 2D-3D Modality Translation [21.962725416347855]
We learn a module that generates a realistic rendering of a 3D object and infers a realistic 3D shape from an image.
By leveraging generative domain translation methods, we are able to define a learning algorithm that requires only weak supervision, with unpaired data.
The resulting model is able to perform 3D shape, pose, and texture inference from 2D images, but can also generate novel textured 3D shapes and renders.
arXiv Detail & Related papers (2020-11-16T15:23:03Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.