Related papers: Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion

Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion

URL: http://arxiv.org/abs/2601.00328v1
Date: Thu, 01 Jan 2026 12:48:56 GMT
Title: Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion
Authors: Yingzhi Tang, Qijian Zhang, Junhui Hou,
Abstract summary: This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
Score: 57.09673862519791
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Achieving consistent and high-fidelity geometry and appearance reconstruction of 3D digital humans from a single RGB image is inherently a challenging task. Existing studies typically resort to decoupled pipelines for geometry estimation and appearance synthesis, often hindering unified reconstruction and causing inconsistencies. This paper introduces \textbf{JGA-LBD}, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation and formulates the generation process as bridge diffusion. Observing that directly integrating heterogeneous input conditions (e.g., depth maps, SMPL models) leads to substantial training difficulties, we unify all conditions into the 3D Gaussian representations, which can be further compressed into a unified latent space through a shared sparse variational autoencoder (VAE). Subsequently, the specialized form of bridge diffusion enables to start with a partial observation of the target latent code and solely focuses on inferring the missing components. Finally, a dedicated decoding module extracts the complete 3D human geometric structure and renders novel views from the inferred latent representation. Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality, including challenging in-the-wild scenarios. Our code will be made publicly available at https://github.com/haiantyz/JGA-LBD.

Related papers

TriaGS: Differentiable Triangulation-Guided Geometric Consistency for 3D Gaussian Splatting [2.441486089588484]
3D Gaussian Splatting is crucial for real-time novel view synthesis due to its efficiency and ability to render images.<n>This paper introduces a novel method that improves reconstruction by enforcing global geometry consistency through constrained multi-view triangulation.<n>We demonstrate the effectiveness of our method across multiple photorealistic datasets, achieving state-of-the-art results.
arXiv Detail & Related papers (2025-12-06T03:45:39Z)
ArtiLatent: Realistic Articulated 3D Object Generation via Structured Latents [31.495577251319315]
ArtiLatent is a generative framework that synthesizes human-made 3D objects with fine-grained geometry, accurate articulation, and realistic appearance.
arXiv Detail & Related papers (2025-10-24T13:08:15Z)
Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes [7.253732091582086]
VAD-GS is a 3DGS framework tailored for geometry recovery in challenging urban scenes.<n>Our method identifies unreliable geometry structures via voxel-based visibility reasoning.<n>It selects informative supporting views through diversity-aware view selection, and recovers missing structures via patch matching-based stereo reconstruction.
arXiv Detail & Related papers (2025-10-10T13:22:12Z)
AlignGS: Aligning Geometry and Semantics for Robust Indoor Reconstruction from Sparse Views [18.361136390711415]
The demand for semantically rich 3D models of indoor scenes is rapidly growing, driven by applications in augmented reality, virtual reality, and robotics.<n>Existing methods often treat semantics as a passive feature painted on an already-formed, and potentially flawed, geometry.<n>This paper introduces AlignGS, a novel framework that actualizes this vision by pioneering a synergistic, end-to-end optimization of geometry and semantics.
arXiv Detail & Related papers (2025-10-09T06:30:20Z)
UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation [98.40254523605581]
UniLat3D is a unified framework that encodes geometry and appearance in a single latent space.<n>Our key contribution is a geometry-appearance Unified VAE, which compresses high-resolution sparse features into a compact latent representation.<n>UniLat3D produces high-quality 3D assets in seconds from a single image.
arXiv Detail & Related papers (2025-09-29T17:21:23Z)
HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping [11.035994094874141]
HBSplat is a framework that seamlessly integrates robust structural cues, virtual view constraints, and occluded region completion.<n> HBSplat sets a new state-of-the-art, achieving up to 21.13 dB PSNR and 0.189 LPIPS, while maintaining real-time inference.
arXiv Detail & Related papers (2025-09-29T15:03:31Z)
Dens3R: A Foundation Model for 3D Geometry Prediction [44.13431776180547]
Dens3R is a 3D foundation model designed for joint geometric dense prediction.<n>By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities.
arXiv Detail & Related papers (2025-07-22T07:22:30Z)
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation [62.87088388345378]
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology.<n>Method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images.<n>Cross-modal attention distillation is proposed to ensure accurate alignment between generated images and geometry.
arXiv Detail & Related papers (2025-06-13T16:19:00Z)
Geometry-Editable and Appearance-Preserving Object Compositon [67.98806888489385]
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties.<n>Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation.<n>We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion model that first leverages semantic embeddings to implicitly capture desired geometric transformations.
arXiv Detail & Related papers (2025-05-27T09:05:28Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image [94.56927147492738]
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes from single images. We show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage. We propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions.
arXiv Detail & Related papers (2024-03-18T17:50:41Z)
GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images. We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization. Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.