LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh
- URL: http://arxiv.org/abs/2502.09617v1
- Date: Thu, 13 Feb 2025 18:59:19 GMT
- Title: LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh
- Authors: Jing Wen, Alexander G. Schwing, Shenlong Wang,
- Abstract summary: Generalizable rendering of an animatable human avatar from sparse inputs relies on data priors and inductive biases extracted from training on large data.
We propose an iterative feedback update framework, which successively improves the canonical human shape representation during reconstruction.
Our approach reconstructs an animatable representation from sparse inputs in less than 1s, renders views with 95.1FPS at $1024 times 1024$, and achieves PSNR/LPIPS*/FID of 24.65/110.82/51.27 on THuman2.0.
- Score: 102.24454703207194
- License:
- Abstract: Generalizable rendering of an animatable human avatar from sparse inputs relies on data priors and inductive biases extracted from training on large data to avoid scene-specific optimization and to enable fast reconstruction. This raises two main challenges: First, unlike iterative gradient-based adjustment in scene-specific optimization, generalizable methods must reconstruct the human shape representation in a single pass at inference time. Second, rendering is preferably computationally efficient yet of high resolution. To address both challenges we augment the recently proposed dual shape representation, which combines the benefits of a mesh and Gaussian points, in two ways. To improve reconstruction, we propose an iterative feedback update framework, which successively improves the canonical human shape representation during reconstruction. To achieve computationally efficient yet high-resolution rendering, we study a coupled-multi-resolution Gaussians-on-Mesh representation. We evaluate the proposed approach on the challenging THuman2.0, XHuman and AIST++ data. Our approach reconstructs an animatable representation from sparse inputs in less than 1s, renders views with 95.1FPS at $1024 \times 1024$, and achieves PSNR/LPIPS*/FID of 24.65/110.82/51.27 on THuman2.0, outperforming the state-of-the-art in rendering quality.
Related papers
- Few-Shot Multi-Human Neural Rendering Using Geometry Constraints [8.819403814092865]
We present a method for recovering the shape and radiance of a scene consisting of multiple people given solely a few images.
Existing approaches using implicit neural representations have achieved impressive results that deliver accurate geometry and appearance.
We propose a neural implicit reconstruction method that addresses the inherent challenges of this task through the following contributions.
arXiv Detail & Related papers (2025-02-11T00:10:58Z) - GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting.
We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space.
Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z) - EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse View [2.11923215233494]
Generalizable neural radiance field (NeRF) enables neural-based digital human rendering without per-scene retraining.
We propose a generalizable human NeRF framework that achieves high-quality and real-time rendering with sparse input views.
arXiv Detail & Related papers (2024-10-16T05:08:00Z) - Generalizable Human Gaussians for Sparse View Synthesis [48.47812125126829]
This paper introduces a new method to learn generalizable human Gaussians that allows photorealistic and accurate view-rendering of a new human subject from a limited set of sparse views.
A pivotal innovation of our approach involves reformulating the learning of 3D Gaussian parameters into a regression process defined on the 2D UV space of a human template.
Our method outperforms recent methods on both within-dataset generalization as well as cross-dataset generalization settings.
arXiv Detail & Related papers (2024-07-17T17:56:30Z) - CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians [18.42203035154126]
We introduce a structured Gaussian representation that can be controlled in 2D image space.
We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization.
We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.
arXiv Detail & Related papers (2024-03-28T15:27:13Z) - GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis [70.24111297192057]
We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner.
The proposed method enables 2K-resolution rendering under a sparse-view camera setting.
arXiv Detail & Related papers (2023-12-04T18:59:55Z) - LookinGood^{\pi}: Real-time Person-independent Neural Re-rendering for
High-quality Human Performance Capture [13.026888802770902]
We propose a novel neural re-rendering approach that is aimed to improve the rendering quality of the low-quality reconstructed results from human performance capture system in real-time.
Our key idea is to utilize the rendered image of reconstructed geometry as the guidance to assist the prediction of person-specific details from few reference images.
We demonstrate that our method outperforms state-of-the-art methods at producing high-fidelity images on unseen people.
arXiv Detail & Related papers (2021-12-15T11:00:21Z) - Deep Neural Networks are Surprisingly Reversible: A Baseline for
Zero-Shot Inversion [90.65667807498086]
This paper presents a zero-shot direct model inversion framework that recovers the input to the trained model given only the internal representation.
We empirically show that modern classification models on ImageNet can, surprisingly, be inverted, allowing an approximate recovery of the original 224x224px images from a representation after more than 20 layers.
arXiv Detail & Related papers (2021-07-13T18:01:43Z) - Deep Variational Network Toward Blind Image Restoration [60.45350399661175]
Blind image restoration is a common yet challenging problem in computer vision.
We propose a novel blind image restoration method, aiming to integrate both the advantages of them.
Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts.
arXiv Detail & Related papers (2020-08-25T03:30:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.