PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution
3D Human Digitization
- URL: http://arxiv.org/abs/2004.00452v1
- Date: Wed, 1 Apr 2020 13:52:53 GMT
- Title: PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution
3D Human Digitization
- Authors: Shunsuke Saito, Tomas Simon, Jason Saragih, Hanbyul Joo
- Abstract summary: Recent advances in image-based 3D human shape estimation have been driven by the significant improvement in representation power afforded by deep neural networks.
We argue that this limitation stems primarily form two conflicting requirements; accurate predictions require large context, but precise predictions require high resolution.
We formulate a multi-level architecture that is end-to-end trainable. A coarse level observes the whole image at lower resolution and focuses on holistic reasoning.
We demonstrate that our approach significantly outperforms existing state-of-the-art techniques on single image human shape reconstruction by fully leveraging 1k-resolution input images.
- Score: 38.6438956631446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in image-based 3D human shape estimation have been driven by
the significant improvement in representation power afforded by deep neural
networks. Although current approaches have demonstrated the potential in real
world settings, they still fail to produce reconstructions with the level of
detail often present in the input images. We argue that this limitation stems
primarily form two conflicting requirements; accurate predictions require large
context, but precise predictions require high resolution. Due to memory
limitations in current hardware, previous approaches tend to take low
resolution images as input to cover large spatial context, and produce less
precise (or low resolution) 3D estimates as a result. We address this
limitation by formulating a multi-level architecture that is end-to-end
trainable. A coarse level observes the whole image at lower resolution and
focuses on holistic reasoning. This provides context to an fine level which
estimates highly detailed geometry by observing higher-resolution images. We
demonstrate that our approach significantly outperforms existing
state-of-the-art techniques on single image human shape reconstruction by fully
leveraging 1k-resolution input images.
Related papers
- SRGS: Super-Resolution 3D Gaussian Splatting [14.26021476067791]
We propose Super-Resolution 3D Gaussian Splatting (SRGS) to perform the optimization in a high-resolution (HR) space.
The sub-pixel constraint is introduced for the increased viewpoints in HR space, exploiting the sub-pixel cross-view information of the multiple low-resolution (LR) views.
Our method achieves high rendering quality on HRNVS only with LR inputs, outperforming state-of-the-art methods on challenging datasets such as Mip-NeRF 360 and Tanks & Temples.
arXiv Detail & Related papers (2024-04-16T06:58:30Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Implicit Shape and Appearance Priors for Few-Shot Full Head
Reconstruction [17.254539604491303]
In this paper, we address the problem of few-shot full 3D head reconstruction.
We accomplish this by incorporating a probabilistic shape and appearance prior into coordinate-based representations.
We extend the H3DS dataset, which now comprises 60 high-resolution 3D full head scans and their corresponding posed images and masks.
arXiv Detail & Related papers (2023-10-12T07:35:30Z) - Refining 3D Human Texture Estimation from a Single Image [3.8761064607384195]
Estimating 3D human texture from a single image is essential in graphics and vision.
We propose a framework that adaptively samples the input by a deformable convolution where offsets are learned via a deep neural network.
arXiv Detail & Related papers (2023-03-06T19:53:50Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Super-resolution 3D Human Shape from a Single Low-Resolution Image [33.70299493354903]
We propose a novel framework to reconstruct super-resolution human shape from a single low-resolution input image.
The proposed framework represents the reconstructed shape with a high-detail implicit function.
arXiv Detail & Related papers (2022-08-23T05:24:39Z) - H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction [27.66008315400462]
Recent learning approaches that implicitly represent surface geometry have shown impressive results in the problem of multi-view 3D reconstruction.
We tackle these limitations for the specific problem of few-shot full 3D head reconstruction.
We learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations.
arXiv Detail & Related papers (2021-07-26T23:04:18Z) - InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images.
We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z) - 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos [107.36352212367179]
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
The proposed method is able to learn 3D body pose and shape across different resolutions with one single model.
We extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input.
arXiv Detail & Related papers (2021-03-11T06:52:12Z) - PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of
Generative Models [77.32079593577821]
PULSE (Photo Upsampling via Latent Space Exploration) generates high-resolution, realistic images at resolutions previously unseen in the literature.
Our method outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible.
arXiv Detail & Related papers (2020-03-08T16:44:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.