3D Magic Mirror: Clothing Reconstruction from a Single Image via a
Causal Perspective
- URL: http://arxiv.org/abs/2204.13096v1
- Date: Wed, 27 Apr 2022 17:46:55 GMT
- Title: 3D Magic Mirror: Clothing Reconstruction from a Single Image via a
Causal Perspective
- Authors: Zhedong Zheng and Jiayin Zhu and Wei Ji and Yi Yang and Tat-Seng Chua
- Abstract summary: This research aims to study a self-supervised 3D clothing reconstruction method.
It recovers the geometry shape, and texture of human clothing from a single 2D image.
- Score: 96.65476492200648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research aims to study a self-supervised 3D clothing reconstruction
method, which recovers the geometry shape, and texture of human clothing from a
single 2D image. Compared with existing methods, we observe that three primary
challenges remain: (1) the conventional template-based methods are limited to
modeling non-rigid clothing objects, e.g., handbags and dresses, which are
common in fashion images; (2) 3D ground-truth meshes of clothing are usually
inaccessible due to annotation difficulties and time costs. (3) It remains
challenging to simultaneously optimize four reconstruction factors, i.e.,
camera viewpoint, shape, texture, and illumination. The inherent ambiguity
compromises the model training, such as the dilemma between a large shape with
a remote camera or a small shape with a close camera.
In an attempt to address the above limitations, we propose a causality-aware
self-supervised learning method to adaptively reconstruct 3D non-rigid objects
from 2D images without 3D annotations. In particular, to solve the inherent
ambiguity among four implicit variables, i.e., camera position, shape, texture,
and illumination, we study existing works and introduce an explainable
structural causal map (SCM) to build our model. The proposed model structure
follows the spirit of the causal map, which explicitly considers the prior
template in the camera estimation and shape prediction. When optimization, the
causality intervention tool, i.e., two expectation-maximization loops, is
deeply embedded in our algorithm to (1) disentangle four encoders and (2) help
the prior template update. Extensive experiments on two 2D fashion benchmarks,
e.g., ATR, and Market-HQ, show that the proposed method could yield
high-fidelity 3D reconstruction. Furthermore, we also verify the scalability of
the proposed method on a fine-grained bird dataset, i.e., CUB.
Related papers
- Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors [24.478875248825563]
We propose a novel image editing technique that enables 3D manipulations on single images.
Our method directly leverages powerful image diffusion models trained on a broad spectrum of text-image pairs.
Our method can generate high-quality 3D-aware image edits with large viewpoint transformations and high appearance and shape consistency with the input image.
arXiv Detail & Related papers (2024-03-18T06:18:59Z) - Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors [15.34487368683311]
We propose a framework that can reconstruct high-quality object-level maps for unknown objects.
Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses for detected objects.
We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions.
arXiv Detail & Related papers (2023-09-17T00:48:19Z) - LIST: Learning Implicitly from Spatial Transformers for Single-View 3D
Reconstruction [5.107705550575662]
List is a novel neural architecture that leverages local and global image features to reconstruct geometric and topological structure of a 3D object from a single image.
We show the superiority of our model in reconstructing 3D objects from both synthetic and real-world images against the state of the art.
arXiv Detail & Related papers (2023-07-23T01:01:27Z) - One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape
Optimization [30.951405623906258]
Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world.
We propose a novel method that takes a single image of any object as input and generates a full 360-degree 3D textured mesh in a single feed-forward pass.
arXiv Detail & Related papers (2023-06-29T13:28:16Z) - Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence
Learning [70.75369367311897]
3D-aware global correspondences are reliable flows that jointly encode global semantic correlations, local deformations, and geometric priors of 3D human bodies.
An adversarial generator takes the garment warped by the 3D-aware flow, and the image of the target person as inputs, to synthesize the photo-realistic try-on result.
arXiv Detail & Related papers (2022-11-25T12:16:21Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape
Laplacian [58.704089101826774]
We present a 3D-aware image deformation method with minimal restrictions on shape category and deformation type.
We take a supervised learning-based approach to predict the shape Laplacian of the underlying volume of a 3D reconstruction represented as a point cloud.
In the experiments, we present our results of deforming 2D character and clothed human images.
arXiv Detail & Related papers (2022-03-29T04:57:18Z) - Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D
Image GANs [156.1209884183522]
State-of-the-art 2D generative models like GANs show unprecedented quality in modeling the natural image manifold.
We present the first attempt to directly mine 3D geometric cues from an off-the-shelf 2D GAN that is trained on RGB images only.
arXiv Detail & Related papers (2020-11-02T09:38:43Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.