Spectral Graphormer: Spectral Graph-based Transformer for Egocentric
Two-Hand Reconstruction using Multi-View Color Images
- URL: http://arxiv.org/abs/2308.11015v1
- Date: Mon, 21 Aug 2023 20:07:02 GMT
- Title: Spectral Graphormer: Spectral Graph-based Transformer for Egocentric
Two-Hand Reconstruction using Multi-View Color Images
- Authors: Tze Ho Elden Tse, Franziska Mueller, Zhengyang Shen, Danhang Tang,
Thabo Beeler, Mingsong Dou, Yinda Zhang, Sasa Petrovic, Hyung Jin Chang,
Jonathan Taylor, Bardia Doosti
- Abstract summary: We propose a novel transformer-based framework that reconstructs two high fidelity hands from multi-view RGB images.
We show that our framework is able to produce realistic two-hand reconstructions and demonstrate the generalisation of synthetic-trained models to real data.
- Score: 33.70056950818641
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a novel transformer-based framework that reconstructs two high
fidelity hands from multi-view RGB images. Unlike existing hand pose estimation
methods, where one typically trains a deep network to regress hand model
parameters from single RGB image, we consider a more challenging problem
setting where we directly regress the absolute root poses of two-hands with
extended forearm at high resolution from egocentric view. As existing datasets
are either infeasible for egocentric viewpoints or lack background variations,
we create a large-scale synthetic dataset with diverse scenarios and collect a
real dataset from multi-calibrated camera setup to verify our proposed
multi-view image feature fusion strategy. To make the reconstruction physically
plausible, we propose two strategies: (i) a coarse-to-fine spectral graph
convolution decoder to smoothen the meshes during upsampling and (ii) an
optimisation-based refinement stage at inference to prevent self-penetrations.
Through extensive quantitative and qualitative evaluations, we show that our
framework is able to produce realistic two-hand reconstructions and demonstrate
the generalisation of synthetic-trained models to real data, as well as
real-time AR/VR applications.
Related papers
- Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks [50.822601495422916]
We propose to utilize exposure bracketing photography to unify image restoration and enhancement tasks.
Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data.
In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed.
arXiv Detail & Related papers (2024-01-01T14:14:35Z) - Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for
Loss-free Multi-Exposure Image Fusion [60.221404321514086]
Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels.
This paper presents a Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which introduces a bi-level optimization search scheme for automatic design of both network structures and loss functions.
arXiv Detail & Related papers (2023-09-03T08:07:26Z) - Towards Scalable Multi-View Reconstruction of Geometry and Materials [27.660389147094715]
We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes.
The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
arXiv Detail & Related papers (2023-06-06T15:07:39Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings.
We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z) - Rethinking Blur Synthesis for Deep Real-World Image Deblurring [4.00114307523959]
We propose a novel realistic blur synthesis pipeline to simulate the camera imaging process.
We develop an effective deblurring model that captures non-local dependencies and local context in the feature domain simultaneously.
A comprehensive experiment on three real-world datasets shows that the proposed deblurring model performs better than state-of-the-art methods.
arXiv Detail & Related papers (2022-09-28T06:50:16Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.