PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image
- URL: http://arxiv.org/abs/2509.07552v2
- Date: Fri, 10 Oct 2025 11:20:05 GMT
- Title: PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image
- Authors: Peng Li, Yisheng He, Yingdong Hu, Yuan Dong, Weihao Yuan, Yuan Liu, Siyu Zhu, Gang Cheng, Zilong Dong, Yike Guo,
- Abstract summary: We present a feed-forward framework for Gaussian full-head synthesis from a single unposed image.<n>Unlike previous work that relies on time-consuming GAN inversion and test-time optimization, our framework can reconstruct the Gaussian full-head model given a single unposed image in a single forward pass.
- Score: 43.212662742135954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a feed-forward framework for Gaussian full-head synthesis from a single unposed image. Unlike previous work that relies on time-consuming GAN inversion and test-time optimization, our framework can reconstruct the Gaussian full-head model given a single unposed image in a single forward pass. This enables fast reconstruction and rendering during inference. To mitigate the lack of large-scale 3D head assets, we propose a large-scale synthetic dataset from trained 3D GANs and train our framework using only synthetic data. For efficient high-fidelity generation, we introduce a coarse-to-fine Gaussian head generation pipeline, where sparse points from the FLAME model interact with the image features by transformer blocks for feature extraction and coarse shape reconstruction, which are then densified for high-fidelity reconstruction. To fully leverage the prior knowledge residing in pretrained 3D GANs for effective reconstruction, we propose a dual-branch framework that effectively aggregates the structured spherical triplane feature and unstructured point-based features for more effective Gaussian head reconstruction. Experimental results show the effectiveness of our framework towards existing work. Project page at: https://panolam.github.io/.
Related papers
- Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting [33.7339252839354]
We introduce a new feed-forward architecture that detects 3D Gaussian primitives at a sub-pixel level.<n>Inspired by keypoint detection, our decoder learns to distribute primitives across image patches.<n>Our resulting pose-free model generates scenes in seconds, achieving state-of-the-art novel view synthesis for feed-forward models.
arXiv Detail & Related papers (2025-12-17T14:59:21Z) - ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation [44.75113949778924]
ARTDECO is a unified framework that combines the efficiency of feed-forward models with the reliability of SLAM-based pipelines.<n>We show that ARTDECO delivers interactive performance comparable to SLAM, robustness similar to feed-forward systems, and reconstruction quality close to per-scene optimization.
arXiv Detail & Related papers (2025-10-09T17:57:38Z) - LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields [23.174562444342286]
We present Large Inverse Rendering Model (LIRM), a transformer architecture that jointly reconstructs high-quality shape, materials, and radiance fields.<n>Our model builds upon the recent Large Reconstruction Models (LRMs) that achieve state-of-the-art sparse-view reconstruction quality.
arXiv Detail & Related papers (2025-04-28T17:48:58Z) - HORT: Monocular Hand-held Objects Reconstruction with Transformers [61.36376511119355]
Reconstructing hand-held objects in 3D from monocular images is a significant challenge in computer vision.<n>We propose a transformer-based model to efficiently reconstruct dense 3D point clouds of hand-held objects.<n>Our method achieves state-of-the-art accuracy with much faster inference speed, while generalizing well to in-the-wild images.
arXiv Detail & Related papers (2025-03-27T09:45:09Z) - FOF-X: Towards Real-time Detailed Human Reconstruction from a Single Image [64.96903230497755]
We introduce FOF-X for real-time reconstruction of detailed human geometry from a single image.<n>The core of FOF is to factorize a 3D occupancy field into a 2D vector field, retaining topology and spatial relationships within the 3D domain.<n>Based on FOF, we design a new reconstruction framework, FOF-X, to avoid the performance degradation caused by texture and lighting.
arXiv Detail & Related papers (2024-12-08T14:46:29Z) - PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence [3.61512056914095]
We present PreF3R, Pose-Free Feed-forward 3D Reconstruction from an image sequence of variable length.
PreF3R removes the need for camera calibration and reconstructs the 3D Gaussian field within a canonical coordinate frame directly from a sequence of unposed images.
arXiv Detail & Related papers (2024-11-25T19:16:29Z) - USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting [45.246178004823534]
Spike cameras, as an innovative neuromorphic camera that captures scenes with the 0-1 bit stream at 40 kHz, are increasingly employed for the 3D reconstruction task.<n>Previous spike-based 3D reconstruction approaches often employ a casecased pipeline.<n>We propose a synergistic optimization framework, textbfUSP-Gaussian, that unifies spike-based image reconstruction, pose correction, and Gaussian splatting into an end-to-end framework.
arXiv Detail & Related papers (2024-11-15T14:15:16Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - LaRa: Efficient Large-Baseline Radiance Fields [32.86296116177701]
We propose a method that unifies local and global reasoning in transformer layers, resulting in improved quality and faster convergence.
Our model represents scenes as Gaussian Volumes and combines this with an image encoder and Group Attention Layers for efficient feed-forward reconstruction.
arXiv Detail & Related papers (2024-07-05T17:59:58Z) - GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view.
The model learns to generate 3D objects represented by sets of GS ellipsoids.
The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z) - GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction [5.112375652774415]
We propose a unified optimization framework that integrates neural signed distance fields (SDFs) with 3DGS for accurate geometry reconstruction and real-time rendering.<n>Our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis.
arXiv Detail & Related papers (2024-05-30T03:46:59Z) - GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation [85.15374487533643]
We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s.
GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information.
We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models.
arXiv Detail & Related papers (2024-03-21T17:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.