Related papers: SAT: Supervisor Regularization and Animation Augmentation for Two-process Monocular Texture 3D Human Reconstruction

SAT: Supervisor Regularization and Animation Augmentation for Two-process Monocular Texture 3D Human Reconstruction

URL: http://arxiv.org/abs/2508.19688v1
Date: Wed, 27 Aug 2025 08:52:35 GMT
Title: SAT: Supervisor Regularization and Animation Augmentation for Two-process Monocular Texture 3D Human Reconstruction
Authors: Gangjian Zhang, Jian Shu, Nanjie Yao, Hao Wang,
Abstract summary: Monocular texture 3D human reconstruction aims to create a complete 3D digital avatar from just a single front-view human RGB image.<n>We propose a two-process 3D human reconstruction framework, SAT, which seamlessly learns various prior geometries in a unified manner.<n>We also propose an Online Animation Augmentation module to tackle data scarcity and improve reconstruction quality.
Score: 7.584417190255802
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monocular texture 3D human reconstruction aims to create a complete 3D digital avatar from just a single front-view human RGB image. However, the geometric ambiguity inherent in a single 2D image and the scarcity of 3D human training data are the main obstacles limiting progress in this field. To address these issues, current methods employ prior geometric estimation networks to derive various human geometric forms, such as the SMPL model and normal maps. However, they struggle to integrate these modalities effectively, leading to view inconsistencies, such as facial distortions. To this end, we propose a two-process 3D human reconstruction framework, SAT, which seamlessly learns various prior geometries in a unified manner and reconstructs high-quality textured 3D avatars as the final output. To further facilitate geometry learning, we introduce a Supervisor Feature Regularization module. By employing a multi-view network with the same structure to provide intermediate features as training supervision, these varied geometric priors can be better fused. To tackle data scarcity and further improve reconstruction quality, we also propose an Online Animation Augmentation module. By building a one-feed-forward animation network, we augment a massive number of samples from the original 3D human data online for model training. Extensive experiments on two benchmarks show the superiority of our approach compared to state-of-the-art methods.

Related papers

MultiGO++: Monocular 3D Clothed Human Reconstruction via Geometry-Texture Collaboration [10.85658775835694]
Monocular 3D clothed human reconstruction aims to generate a complete and realistic textured 3D avatar from a single image.<n>Existing methods are commonly trained under multi-view supervision with annotated geometric priors, and during inference, these priors are estimated by the pre-trained network from the monocular input.<n>We propose a novel reconstruction framework, named MultiGO++, which achieves effective systematic geometry-texture collaboration.
arXiv Detail & Related papers (2026-03-05T09:37:55Z)
Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image [10.36303976374455]
Existing approaches often rely on fine-tuning pretrained 2D diffusion models or directly generating 3D information through fast network inference.<n>We present a novel method that seamlessly integrates geometry and perception priors without requiring additional model training.<n>Experiments demonstrate the higher-fidelity reconstruction results of our method, outperforming existing methods on novel view synthesis and 3D reconstruction.
arXiv Detail & Related papers (2025-06-26T11:22:06Z)
Unify3D: An Augmented Holistic End-to-end Monocular 3D Human Reconstruction via Anatomy Shaping and Twins Negotiating [4.708237200844732]
This paper introduces a novel paradigm that treats human reconstruction as a holistic process.<n>We propose a novel reconstruction framework consisting of two core components: the Anatomy Shaping Extraction module and the Twins Negotiating Reconstruction U-Net.<n>We also propose a Comic Data Augmentation strategy and construct 15k+ 3D human scans to bolster model performance in more complex case input.
arXiv Detail & Related papers (2025-04-25T09:49:23Z)
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration [29.03216532351979]
We introduce textbfHumanDreamer-X, a novel framework that integrates multi-view human generation and reconstruction into a unified pipeline.<n>In this framework, 3D Gaussian Splatting serves as an explicit 3D representation to provide initial geometry and appearance priority.<n>We also propose an attention modulation strategy that effectively enhances geometric details identity consistency across multi-view.
arXiv Detail & Related papers (2025-04-04T15:35:14Z)
FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis [51.193297565630886]
The challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images. This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets. We propose leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization.
arXiv Detail & Related papers (2024-10-13T01:25:05Z)
Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using Pixel-aligned Reconstruction Priors [56.192682114114724]
Get3DHuman is a novel 3D human framework that can significantly boost the realism and diversity of the generated outcomes. Our key observation is that the 3D generator can profit from human-related priors learned through 2D human generators and 3D reconstructors.
arXiv Detail & Related papers (2023-02-02T15:37:46Z)
AvatarGen: A 3D Generative Model for Animatable Human Avatars [108.11137221845352]
AvatarGen is an unsupervised generation of 3D-aware clothed humans with various appearances and controllable geometries. Our method can generate animatable 3D human avatars with high-quality appearance and geometry modeling. It is competent for many applications, e.g., single-view reconstruction, re-animation, and text-guided synthesis/editing.
arXiv Detail & Related papers (2022-11-26T15:15:45Z)
Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images. This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories. We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z)
Beyond 3DMM: Learning to Capture High-fidelity 3D Face Shape [77.95154911528365]
3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori. Previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry. This paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person.
arXiv Detail & Related papers (2022-04-09T03:46:18Z)
NeuralReshaper: Single-image Human-body Retouching with Deep Neural Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks. Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image. To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z)
Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image. We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end. Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.