Temporal Consistency Loss for High Resolution Textured and Clothed
3DHuman Reconstruction from Monocular Video
- URL: http://arxiv.org/abs/2104.09259v1
- Date: Mon, 19 Apr 2021 13:04:29 GMT
- Title: Temporal Consistency Loss for High Resolution Textured and Clothed
3DHuman Reconstruction from Monocular Video
- Authors: Akin Caliskan, Armin Mustafa, Adrian Hilton
- Abstract summary: We present a novel method to learn temporally consistent 3D reconstruction of clothed people from a monocular video.
The proposed advances improve the temporal consistency and accuracy of both the 3D reconstruction and texture prediction from a monocular video.
- Score: 35.42021156572568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel method to learn temporally consistent 3D reconstruction of
clothed people from a monocular video. Recent methods for 3D human
reconstruction from monocular video using volumetric, implicit or parametric
human shape models, produce per frame reconstructions giving temporally
inconsistent output and limited performance when applied to video. In this
paper, we introduce an approach to learn temporally consistent features for
textured reconstruction of clothed 3D human sequences from monocular video by
proposing two advances: a novel temporal consistency loss function; and hybrid
representation learning for implicit 3D reconstruction from 2D images and
coarse 3D geometry. The proposed advances improve the temporal consistency and
accuracy of both the 3D reconstruction and texture prediction from a monocular
video. Comprehensive comparative performance evaluation on images of people
demonstrates that the proposed method significantly outperforms the
state-of-the-art learning-based single image 3D human shape estimation
approaches achieving significant improvement of reconstruction accuracy,
completeness, quality and temporal consistency.
Related papers
- Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review [0.08823202672546056]
This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views.
An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies.
arXiv Detail & Related papers (2024-05-06T12:32:38Z) - The More You See in 2D, the More You Perceive in 3D [32.578628729549145]
SAP3D is a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.
We show that as the number of input images increases, the performance of our approach improves.
arXiv Detail & Related papers (2024-04-04T17:59:40Z) - Temporal-Aware Refinement for Video-based Human Pose and Shape Recovery [20.566505924677013]
We propose a temporal-aware refining network (TAR) to explore temporal-aware global and local image features for accurate pose and shape recovery.
Our TAR obtains more accurate results than previous state-of-the-art methods on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-11-16T03:35:17Z) - Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner.
Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z) - HiFi-123: Towards High-fidelity One Image to 3D Content Generation [64.81863143986384]
HiFi-123 is a method designed for high-fidelity and multi-view consistent 3D generation.
We present a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods.
We also present a novel Reference-Guided State Distillation (RGSD) loss.
arXiv Detail & Related papers (2023-10-10T16:14:20Z) - Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion [67.71624118802411]
We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects.
We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data.
Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.
arXiv Detail & Related papers (2023-04-20T17:59:34Z) - ReFu: Refine and Fuse the Unobserved View for Detail-Preserving
Single-Image 3D Human Reconstruction [31.782985891629448]
Single-image 3D human reconstruction aims to reconstruct the 3D textured surface of the human body given a single image.
We propose ReFu, a coarse-to-fine approach that refines the projected backside view image and fuses the refined image to predict the final human body.
arXiv Detail & Related papers (2022-11-09T09:14:11Z) - State of the Art in Dense Monocular Non-Rigid 3D Reconstruction [100.9586977875698]
3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics.
This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views.
arXiv Detail & Related papers (2022-10-27T17:59:53Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Model-based 3D Hand Reconstruction via Self-Supervised Learning [72.0817813032385]
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity.
We propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint.
For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations.
arXiv Detail & Related papers (2021-03-22T10:12:43Z) - Multi-View Consistency Loss for Improved Single-Image 3D Reconstruction
of Clothed People [36.30755368202957]
We present a novel method to improve the accuracy of the 3D reconstruction of clothed human shape from a single image.
The accuracy and completeness for reconstruction of clothed people is limited due to the large variation in shape resulting from clothing, hair, body size, pose and camera viewpoint.
arXiv Detail & Related papers (2020-09-29T17:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.