Related papers: PALM: A Dataset and Baseline for Learning Multi-subject Hand Prior

PALM: A Dataset and Baseline for Learning Multi-subject Hand Prior

URL: http://arxiv.org/abs/2511.05403v1
Date: Fri, 07 Nov 2025 16:28:25 GMT
Title: PALM: A Dataset and Baseline for Learning Multi-subject Hand Prior
Authors: Zicong Fan, Edoardo Remelli, David Dimond, Fadime Sener, Liuhao Ge, Bugra Tekin, Cem Keskin, Shreyas Hampali,
Abstract summary: We present PALM, a large-scale dataset comprising 13k high-quality hand scans from 263 subjects and 90k multi-view images.<n>Palm's scale and diversity make it a valuable real-world resource for hand modeling and related research.
Score: 19.320761012596183
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to grasp objects, signal with gestures, and share emotion through touch all stem from the unique capabilities of human hands. Yet creating high-quality personalized hand avatars from images remains challenging due to complex geometry, appearance, and articulation, particularly under unconstrained lighting and limited views. Progress has also been limited by the lack of datasets that jointly provide accurate 3D geometry, high-resolution multiview imagery, and a diverse population of subjects. To address this, we present PALM, a large-scale dataset comprising 13k high-quality hand scans from 263 subjects and 90k multi-view images, capturing rich variation in skin tone, age, and geometry. To show its utility, we present a baseline PALM-Net, a multi-subject prior over hand geometry and material properties learned via physically based inverse rendering, enabling realistic, relightable single-image hand avatar personalization. PALM's scale and diversity make it a valuable real-world resource for hand modeling and related research.

Related papers

PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images [23.745241278910946]
PF-LHM is a large human reconstruction model that generates high-quality 3D avatars in seconds from one or multiple casually captured pose-free images.<n>Our method unifies single- and multi-image 3D human reconstruction, achieving high-fidelity and animatable 3D human avatars without requiring camera and human pose annotations.
arXiv Detail & Related papers (2025-06-16T17:59:56Z)
EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z)
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.<n>Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.<n>We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z)
HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation [11.876066932162873]
HUP-3D is a 3D multiview synthetic dataset for hand-ultrasound probe pose estimation. Our dataset consists of over 31k sets of movements. Our approach includes image rendering concept, enhancing diversity with various hand and arm textures.
arXiv Detail & Related papers (2024-07-12T12:25:42Z)
CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization [27.55341255800119]
We present CharacterGen, a framework developed to efficiently generate 3D characters. A transformer-based, generalizable sparse-view reconstruction model is the other core component of our approach. We have curated a dataset of anime characters, rendered in multiple poses and views, to train and evaluate our model.
arXiv Detail & Related papers (2024-02-27T05:10:59Z)
Animatable Virtual Humans: Learning pose-dependent human representations in UV space for interactive performance synthesis [11.604386285817302]
We learn pose dependent appearance and geometry from highly accurate dynamic mesh sequences. We encode both pose-dependent appearance and geometry in the consistent UV space of the SMPL model.
arXiv Detail & Related papers (2023-10-05T15:49:44Z)
Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis [52.720314035084215]
This work targets at using a general deep learning framework to synthesize free-viewpoint images of arbitrary human performers. We present a simple yet powerful framework, named Generalizable Neural Performer (GNR), that learns a generalizable and robust neural body representation. Experiments on GeneBody-1.0 and ZJU-Mocap show better robustness of our methods than recent state-of-the-art generalizable methods.
arXiv Detail & Related papers (2022-04-25T17:14:22Z)
Facial Geometric Detail Recovery via Implicit Representation [147.07961322377685]
We present a robust texture-guided geometric detail recovery approach using only a single in-the-wild facial image. Our method combines high-quality texture completion with the powerful expressiveness of implicit surfaces. Our method not only recovers accurate facial details but also decomposes normals, albedos, and shading parts in a self-supervised way.
arXiv Detail & Related papers (2022-03-18T01:42:59Z)
HSPACE: Synthetic Parametric Humans Animated in Complex Environments [67.8628917474705]
We build a large-scale photo-realistic dataset, Human-SPACE, of animated humans placed in complex indoor and outdoor environments. We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, in order to generate an initial dataset of over 1 million frames. Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines.
arXiv Detail & Related papers (2021-12-23T22:27:55Z)
Zero-Shot Text-Guided Object Generation with Dream Fields [111.06026544180398]
We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision. In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.
arXiv Detail & Related papers (2021-12-02T17:53:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.