Related papers: Pixel2ISDF: Implicit Signed Distance Fields based Human Body Model from Multi-view and Multi-pose Images

Pixel2ISDF: Implicit Signed Distance Fields based Human Body Model from Multi-view and Multi-pose Images

URL: http://arxiv.org/abs/2212.02765v1
Date: Tue, 6 Dec 2022 05:30:49 GMT
Title: Pixel2ISDF: Implicit Signed Distance Fields based Human Body Model from Multi-view and Multi-pose Images
Authors: Jianchuan Chen, Wentao Yi, Tiantian Wang, Xing Li, Liqian Ma, Yangyu Fan, Huchuan Lu
Abstract summary: We focus on reconstructing clothed humans in the canonical space given multiple views and poses of a human as the input. We learn latent codes on the posed mesh by leveraging multiple input images and then assign the latent codes to the mesh in the canonical space. Our work for reconstructing the human shape on canonical pose achieves 3rd performance on WCPA MVP-Human Body Challenge.
Score: 67.45882013828256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this report, we focus on reconstructing clothed humans in the canonical space given multiple views and poses of a human as the input. To achieve this, we utilize the geometric prior of the SMPLX model in the canonical space to learn the implicit representation for geometry reconstruction. Based on the observation that the topology between the posed mesh and the mesh in the canonical space are consistent, we propose to learn latent codes on the posed mesh by leveraging multiple input images and then assign the latent codes to the mesh in the canonical space. Specifically, we first leverage normal and geometry networks to extract the feature vector for each vertex on the SMPLX mesh. Normal maps are adopted for better generalization to unseen images compared to 2D images. Then, features for each vertex on the posed mesh from multiple images are integrated by MLPs. The integrated features acting as the latent code are anchored to the SMPLX mesh in the canonical space. Finally, latent code for each 3D point is extracted and utilized to calculate the SDF. Our work for reconstructing the human shape on canonical pose achieves 3rd performance on WCPA MVP-Human Body Challenge.

Related papers

Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks [4.562267702525219]
For a 2D image of a human head, a Vision Transformer network predicts a 3D embedding for each pixel, which corresponds to a location in a 3D canonical unit cube.<n>We employ multi-task learning with face landmarks and segmentation constraints, as well as imposing spatial continuity of embeddings.<n>The representation can be used for finding common semantic parts, face/head tracking, and stereo reconstruction.
arXiv Detail & Related papers (2025-11-04T18:58:03Z)
WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting [51.69408870574092]
We present WorldMirror, an all-in-one, feed-forward model for versatile 3D geometric prediction tasks.<n>Our framework flexibly integrates diverse geometric priors, including camera poses, intrinsics, and depth maps.<n>WorldMirror achieves state-of-the-art performance across diverse benchmarks from camera, point map, depth, and surface normal estimation to novel view synthesis.
arXiv Detail & Related papers (2025-10-12T17:59:09Z)
Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields. LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation. It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z)
CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images [17.10258463020844]
We present a novel framework for reconstructing animatable human avatars from multiple images, termed CanonicalFusion. We first predict Linear Blend Skinning (LBS) weight maps and depth maps using a shared-encoder-dual-decoder network, enabling direct canonicalization of the 3D mesh from the predicted depth maps. We also introduce a forward skinning-based different rendering scheme to merge the reconstructed results from multiple images.
arXiv Detail & Related papers (2024-07-05T08:36:26Z)
HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos [52.23323966700072]
We present a framework for acquiring human avatars that are attached with high-resolution physically-based material textures and mesh from monocular video. Our method introduces a novel information fusion strategy to combine the information from the monocular video and synthesize virtual multi-view images. Experiments show that our approach outperforms previous representations in terms of high fidelity, and this explicit result supports deployment on common triangulars.
arXiv Detail & Related papers (2024-05-18T11:49:09Z)
GALA: Generating Animatable Layered Assets from a Single Scan [20.310367593475508]
We present GALA, a framework that takes as input a single-layer clothed 3D human mesh and decomposes it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create novel clothed human avatars with any pose.
arXiv Detail & Related papers (2024-01-23T18:59:59Z)
Weakly-Supervised 3D Reconstruction of Clothed Humans via Normal Maps [1.6462601662291156]
We present a novel deep learning-based approach to the 3D reconstruction of clothed humans using weak supervision via 2D normal maps. Given a single RGB image or multiview images, our network infers a signed distance function (SDF) discretized on a tetrahedral mesh surrounding the body in a rest pose. We demonstrate the efficacy of our approach for both network inference and 3D reconstruction.
arXiv Detail & Related papers (2023-11-27T18:06:35Z)
Neural Capture of Animatable 3D Human from Monocular Video [38.974181971541846]
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views. Our method is based on a dynamic Neural Radiance Field (NeRF) rigged by a mesh-based parametric 3D human model serving as a geometry proxy.
arXiv Detail & Related papers (2022-08-18T09:20:48Z)
Facial Depth and Normal Estimation using Single Dual-Pixel Camera [81.02680586859105]
We introduce a DP-oriented Depth/Normal network that reconstructs the 3D facial geometry. It contains the corresponding ground-truth 3D models including depth map and surface normal in metric scale. It achieves state-of-the-art performances over recent DP-based depth/normal estimation methods.
arXiv Detail & Related papers (2021-11-25T05:59:27Z)
SofGAN: A Portrait Image Generator with Dynamic Styling [47.10046693844792]
Generative Adversarial Networks (GANs) have been widely used for portrait image generation. We propose a SofGAN image generator to decouple the latent space of portraits into two subspaces. We show that our system can generate high quality portrait images with independently controllable geometry and texture attributes.
arXiv Detail & Related papers (2020-07-07T20:28:47Z)
Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction [97.3274868990133]
Geo-PIFu is a method to recover a 3D mesh from a monocular color image of a clothed person. We show that, by both encoding query points and constraining global shape using latent voxel features, the reconstruction we obtain for clothed human meshes exhibits less shape distortion and improved surface details compared to competing methods.
arXiv Detail & Related papers (2020-06-15T01:11:48Z)
3D Human Mesh Regression with Dense Correspondence [95.92326689172877]
Estimating 3D mesh of the human body from a single 2D image is an important task with many applications such as augmented reality and Human-Robot interaction. Prior works reconstructed 3D mesh from global image feature extracted by using convolutional neural network (CNN), where the dense correspondences between the mesh surface and the image pixels are missing. This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space.
arXiv Detail & Related papers (2020-06-10T08:50:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.