Related papers: Learning Efficient and Generalizable Human Representation with Human Gaussian Model

Learning Efficient and Generalizable Human Representation with Human Gaussian Model

URL: http://arxiv.org/abs/2507.18758v1
Date: Thu, 24 Jul 2025 19:18:59 GMT
Title: Learning Efficient and Generalizable Human Representation with Human Gaussian Model
Authors: Yifan Liu, Shengjun Zhang, Chensheng Dai, Yang Chen, Hao Liu, Chen Li, Yueqi Duan,
Abstract summary: We propose Human Gaussian Graph to model the connection between predicted Gaussians and human SMPL mesh.<n>We show that we can leverage information from all frames to recover animatable human representation.<n> Experimental results on novel view synthesis and novel pose animation demonstrate the efficiency and generalization of our method.
Score: 25.864364910265127
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling animatable human avatars from videos is a long-standing and challenging problem. While conventional methods require per-instance optimization, recent feed-forward methods have been proposed to generate 3D Gaussians with a learnable network. However, these methods predict Gaussians for each frame independently, without fully capturing the relations of Gaussians from different timestamps. To address this, we propose Human Gaussian Graph to model the connection between predicted Gaussians and human SMPL mesh, so that we can leverage information from all frames to recover an animatable human representation. Specifically, the Human Gaussian Graph contains dual layers where Gaussians are the first layer nodes and mesh vertices serve as the second layer nodes. Based on this structure, we further propose the intra-node operation to aggregate various Gaussians connected to one mesh vertex, and inter-node operation to support message passing among mesh node neighbors. Experimental results on novel view synthesis and novel pose animation demonstrate the efficiency and generalization of our method.

Related papers

Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images [12.274418254425019]
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance.<n>We propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations.<n>We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method.
arXiv Detail & Related papers (2025-03-20T16:56:13Z)
RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images [39.03889696169877]
RoGSplat is a novel approach for synthesizing high-fidelity novel views of unseen human from sparse multi-view images.<n>Our method outperforms state-of-the-art methods in novel view synthesis and cross-dataset generalization.
arXiv Detail & Related papers (2025-03-18T12:18:34Z)
NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images. We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z)
GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z)
GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling [11.91812502521729]
Gaussian splatting has demonstrated excellent performance for view synthesis and scene reconstruction.<n>Since each Gaussian primitive encodes both appearance and geometry, appearance modeling requires a number of Gaussian primitives.<n>We propose to employ perprimitive representation so that even a single Gaussian can be used to capture appearance details.
arXiv Detail & Related papers (2024-09-19T17:58:44Z)
Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos [58.22272760132996]
We show that existing 4D Gaussian methods dramatically fail in this setup because the monocular setting is underconstrained. We propose Dynamic Gaussian Marbles, which consist of three core modifications that target the difficulties of the monocular setting. We evaluate on the Nvidia Dynamic Scenes dataset and the DyCheck iPhone dataset, and show that Gaussian Marbles significantly outperforms other Gaussian baselines in quality.
arXiv Detail & Related papers (2024-06-26T19:37:07Z)
Generalizable Human Gaussians from Single-View Image [52.100234836129786]
We introduce a single-view generalizable Human Gaussian Model (HGM)<n>Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians.<n>To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch.
arXiv Detail & Related papers (2024-06-10T06:38:11Z)
UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling [71.87807614875497]
We propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures. We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose.
arXiv Detail & Related papers (2024-03-18T09:03:56Z)
Mesh Graphormer [17.75480888764098]
We present a graph-convolution-reinforced transformer, named Mesh Graphormer, for 3D human pose and mesh reconstruction from a single image.
arXiv Detail & Related papers (2021-04-01T06:16:36Z)
CatGCN: Graph Convolutional Networks with Categorical Node Features [99.555850712725]
CatGCN is tailored for graph learning when the node features are categorical. We train CatGCN in an end-to-end fashion and demonstrate it on semi-supervised node classification.
arXiv Detail & Related papers (2020-09-11T09:25:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.