Related papers: InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting

InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting

URL: http://arxiv.org/abs/2601.02098v1
Date: Mon, 05 Jan 2026 13:26:02 GMT
Title: InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting
Authors: Jinlong Fan, Shanshan Zhao, Liang Zheng, Jing Zhang, Yuxiang Yang, Mingming Gong,
Abstract summary: InpaintHuman is a novel method for generating high-fidelity, complete, and animatable avatars from occluded monocular videos.<n>Our approach employs direct pixel-level supervision to ensure identity fidelity.
Score: 64.42884719282323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reconstructing complete and animatable 3D human avatars from monocular videos remains challenging, particularly under severe occlusions. While 3D Gaussian Splatting has enabled photorealistic human rendering, existing methods struggle with incomplete observations, often producing corrupted geometry and temporal inconsistencies. We present InpaintHuman, a novel method for generating high-fidelity, complete, and animatable avatars from occluded monocular videos. Our approach introduces two key innovations: (i) a multi-scale UV-parameterized representation with hierarchical coarse-to-fine feature interpolation, enabling robust reconstruction of occluded regions while preserving geometric details; and (ii) an identity-preserving diffusion inpainting module that integrates textual inversion with semantic-conditioned guidance for subject-specific, temporally coherent completion. Unlike SDS-based methods, our approach employs direct pixel-level supervision to ensure identity fidelity. Experiments on synthetic benchmarks (PeopleSnapshot, ZJU-MoCap) and real-world scenarios (OcMotion) demonstrate competitive performance with consistent improvements in reconstruction quality across diverse poses and viewpoints.

Related papers

Bringing Your Portrait to 3D Presence [46.11577347349078]
We present a unified framework for reconstructing animatable 3D human avatars from a single portrait across head, half-body, and full-body inputs.<n>Our method tackles three bottlenecks: pose- and framing-sensitive feature representations, limited scalable data, and unreliable proxy-mesh estimation.
arXiv Detail & Related papers (2025-11-27T15:42:07Z)
HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics [60.737929335600015]
We present textbfHumanGenesis, a framework that integrates geometric and generative modeling through four collaborative agents.<n>HumanGenesis achieves state-of-the-art performance on tasks including text-guided synthesis, video reenactment, and novel-pose generalization.
arXiv Detail & Related papers (2025-08-13T14:50:19Z)
PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images [23.745241278910946]
PF-LHM is a large human reconstruction model that generates high-quality 3D avatars in seconds from one or multiple casually captured pose-free images.<n>Our method unifies single- and multi-image 3D human reconstruction, achieving high-fidelity and animatable 3D human avatars without requiring camera and human pose annotations.
arXiv Detail & Related papers (2025-06-16T17:59:56Z)
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image [37.16845070245751]
We present a novel pipeline to reconstruct 3D humans with multiview consistency from a single occluded image.<n>A 3D reconstruction model is then trained to predict a set of 3D Gaussians conditioned on both the occluded input and synthesized views.<n> achieves significant improvements in terms of both novel view synthesis (upto 3 db PSNR) and geometric reconstruction under challenging conditions.
arXiv Detail & Related papers (2025-03-19T19:56:18Z)
GAS: Generative Avatar Synthesis from a Single Image [54.95198111659466]
We present a framework for synthesizing view-consistent and temporally coherent avatars from a single image.<n>Our approach combines the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model.
arXiv Detail & Related papers (2025-02-10T19:00:39Z)
DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.<n>Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z)
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement [23.707586182294932]
Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. We introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image.
arXiv Detail & Related papers (2024-08-26T12:10:52Z)
Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings. We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z)
DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras. Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z)
Multi-person Implicit Reconstruction from a Single Image [37.6877421030774]
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image. Existing multi-person methods suffer from two main drawbacks: they are often model-based and cannot capture accurate 3D models of people with loose clothing and hair.
arXiv Detail & Related papers (2021-04-19T13:21:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.