Spectral Graphormer: Spectral Graph-based Transformer for Egocentric
Two-Hand Reconstruction using Multi-View Color Images
- URL: http://arxiv.org/abs/2308.11015v1
- Date: Mon, 21 Aug 2023 20:07:02 GMT
- Title: Spectral Graphormer: Spectral Graph-based Transformer for Egocentric
Two-Hand Reconstruction using Multi-View Color Images
- Authors: Tze Ho Elden Tse, Franziska Mueller, Zhengyang Shen, Danhang Tang,
Thabo Beeler, Mingsong Dou, Yinda Zhang, Sasa Petrovic, Hyung Jin Chang,
Jonathan Taylor, Bardia Doosti
- Abstract summary: We propose a novel transformer-based framework that reconstructs two high fidelity hands from multi-view RGB images.
We show that our framework is able to produce realistic two-hand reconstructions and demonstrate the generalisation of synthetic-trained models to real data.
- Score: 33.70056950818641
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a novel transformer-based framework that reconstructs two high
fidelity hands from multi-view RGB images. Unlike existing hand pose estimation
methods, where one typically trains a deep network to regress hand model
parameters from single RGB image, we consider a more challenging problem
setting where we directly regress the absolute root poses of two-hands with
extended forearm at high resolution from egocentric view. As existing datasets
are either infeasible for egocentric viewpoints or lack background variations,
we create a large-scale synthetic dataset with diverse scenarios and collect a
real dataset from multi-calibrated camera setup to verify our proposed
multi-view image feature fusion strategy. To make the reconstruction physically
plausible, we propose two strategies: (i) a coarse-to-fine spectral graph
convolution decoder to smoothen the meshes during upsampling and (ii) an
optimisation-based refinement stage at inference to prevent self-penetrations.
Through extensive quantitative and qualitative evaluations, we show that our
framework is able to produce realistic two-hand reconstructions and demonstrate
the generalisation of synthetic-trained models to real data, as well as
real-time AR/VR applications.
Related papers
- Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration [47.26304397935705]
Image restoration aims to recover high-quality images from degraded inputs.
Existing methods lack a unified training benchmark for iterations and configurations.
We introduce a large-scale IR dataset called ReSyn, which employs a novel image filtering method based on image complexity.
arXiv Detail & Related papers (2024-12-05T02:11:51Z) - FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration [66.61201445650323]
Existing methods suffer from a generalization bottleneck in real-world scenarios.
We contribute a million-scale dataset with two notable advantages over existing training data.
We propose a robust model, FoundIR, to better address a broader range of restoration tasks in real-world scenarios.
arXiv Detail & Related papers (2024-12-02T12:08:40Z) - Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for
Loss-free Multi-Exposure Image Fusion [60.221404321514086]
Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels.
This paper presents a Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which introduces a bi-level optimization search scheme for automatic design of both network structures and loss functions.
arXiv Detail & Related papers (2023-09-03T08:07:26Z) - Towards Scalable Multi-View Reconstruction of Geometry and Materials [27.660389147094715]
We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes.
The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
arXiv Detail & Related papers (2023-06-06T15:07:39Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.