Enhancing Monocular 3D Hand Reconstruction with Learned Texture Priors
- URL: http://arxiv.org/abs/2508.09629v1
- Date: Wed, 13 Aug 2025 08:59:51 GMT
- Title: Enhancing Monocular 3D Hand Reconstruction with Learned Texture Priors
- Authors: Giorgos Karvounas, Nikolaos Kyriazis, Iason Oikonomidis, Georgios Pavlakos, Antonis A. Argyros,
- Abstract summary: We revisit the role of texture in monocular 3D hand reconstruction, not as an afterthought for photorealism, but as a dense, spatially grounded cue.<n>We propose a lightweight texture module that embeds per-pixel observations into UV texture space.<n>The resulting system improves both accuracy and realism, demonstrating the value of appearance-guided alignment in hand reconstruction.
- Score: 14.079851805480425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We revisit the role of texture in monocular 3D hand reconstruction, not as an afterthought for photorealism, but as a dense, spatially grounded cue that can actively support pose and shape estimation. Our observation is simple: even in high-performing models, the overlay between predicted hand geometry and image appearance is often imperfect, suggesting that texture alignment may be an underused supervisory signal. We propose a lightweight texture module that embeds per-pixel observations into UV texture space and enables a novel dense alignment loss between predicted and observed hand appearances. Our approach assumes access to a differentiable rendering pipeline and a model that maps images to 3D hand meshes with known topology, allowing us to back-project a textured hand onto the image and perform pixel-based alignment. The module is self-contained and easily pluggable into existing reconstruction pipelines. To isolate and highlight the value of texture-guided supervision, we augment HaMeR, a high-performing yet unadorned transformer architecture for 3D hand pose estimation. The resulting system improves both accuracy and realism, demonstrating the value of appearance-guided alignment in hand reconstruction.
Related papers
- A Scalable Attention-Based Approach for Image-to-3D Texture Mapping [3.8476192001237597]
High-quality textures are critical for realistic 3D content creation.<n>Existing generative methods are slow, rely on UV maps, and often fail to remain faithful to a reference image.<n>We propose a transformer-based framework that predicts a 3D texture field directly from a single image and a mesh.
arXiv Detail & Related papers (2025-09-05T14:18:52Z) - HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision.<n>We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects.<n>Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z) - DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures.
In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process.
In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z) - GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement [51.97726804507328]
We propose a novel approach for 3D mesh reconstruction from multi-view images.<n>Our method takes inspiration from large reconstruction models that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images.
arXiv Detail & Related papers (2024-06-09T05:19:24Z) - HiFiHR: Enhancing 3D Hand Reconstruction from a Single Image via
High-Fidelity Texture [40.012406098563204]
We present HiFiHR, a high-fidelity hand reconstruction approach that utilizes render-and-compare in the learning-based framework from a single image.
Experimental results on public benchmarks including FreiHAND and HO-3D demonstrate that our method outperforms the state-of-the-art hand reconstruction methods in texture reconstruction quality.
arXiv Detail & Related papers (2023-08-25T18:48:40Z) - TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using
Differentiable Rendering [54.35405028643051]
We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone.
Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps.
We adopt the neural implicit surface reconstruction method, which allows for high-quality mesh.
arXiv Detail & Related papers (2023-03-27T10:07:52Z) - Delicate Textured Mesh Recovery from NeRF via Adaptive Surface
Refinement [78.48648360358193]
We present a novel framework that generates textured surface meshes from images.
Our approach begins by efficiently initializing the geometry and view-dependency appearance with a NeRF.
We jointly refine the appearance with geometry and bake it into texture images for real-time rendering.
arXiv Detail & Related papers (2023-03-03T17:14:44Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - ReFu: Refine and Fuse the Unobserved View for Detail-Preserving
Single-Image 3D Human Reconstruction [31.782985891629448]
Single-image 3D human reconstruction aims to reconstruct the 3D textured surface of the human body given a single image.
We propose ReFu, a coarse-to-fine approach that refines the projected backside view image and fuses the refined image to predict the final human body.
arXiv Detail & Related papers (2022-11-09T09:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.