Related papers: KaoLRM: Repurposing Pre-trained Large Reconstruction Models for Parametric 3D Face Reconstruction

KaoLRM: Repurposing Pre-trained Large Reconstruction Models for Parametric 3D Face Reconstruction

URL: http://arxiv.org/abs/2601.12736v1
Date: Mon, 19 Jan 2026 05:36:59 GMT
Title: KaoLRM: Repurposing Pre-trained Large Reconstruction Models for Parametric 3D Face Reconstruction
Authors: Qingtian Zhu, Xu Cao, Zhixiang Wang, Yinqiang Zheng, Takafumi Taketomi,
Abstract summary: KaoLRM re-targets the learned prior of the Large Reconstruction Model (LRM) for parametric 3D face reconstruction from single-view images.<n> Experiments on both controlled and in-the-wild benchmarks demonstrate that KaoLRM achieves superior reconstruction accuracy and cross-view consistency.
Score: 51.67605823241639
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose KaoLRM to re-target the learned prior of the Large Reconstruction Model (LRM) for parametric 3D face reconstruction from single-view images. Parametric 3D Morphable Models (3DMMs) have been widely used for facial reconstruction due to their compact and interpretable parameterization, yet existing 3DMM regressors often exhibit poor consistency across varying viewpoints. To address this, we harness the pre-trained 3D prior of LRM and incorporate FLAME-based 2D Gaussian Splatting into LRM's rendering pipeline. Specifically, KaoLRM projects LRM's pre-trained triplane features into the FLAME parameter space to recover geometry, and models appearance via 2D Gaussian primitives that are tightly coupled to the FLAME mesh. The rich prior enables the FLAME regressor to be aware of the 3D structure, leading to accurate and robust reconstructions under self-occlusions and diverse viewpoints. Experiments on both controlled and in-the-wild benchmarks demonstrate that KaoLRM achieves superior reconstruction accuracy and cross-view consistency, while existing methods remain sensitive to viewpoint variations. The code is released at https://github.com/CyberAgentAILab/KaoLRM.

Related papers

GeoFusionLRM: Geometry-Aware Self-Correction for Consistent 3D Reconstruction [27.169882738788257]
Single-image 3D reconstruction with large reconstruction models (LRMs) has advanced rapidly, yet reconstructions often exhibit geometric inconsistencies and details that limit fidelity.<n>We introduce GeoFusionLRM, a geometry-aware self-correction framework that leverages the model's own normal and depth predictions to refine structural accuracy.
arXiv Detail & Related papers (2026-02-15T12:39:04Z)
S-MUSt3R: Sliding Multi-view 3D Reconstruction [17.018626984951823]
This work proposes S-MUSt3R, a simple and efficient pipeline that extends the limits of foundation models for monocular 3D reconstruction.<n>We show that S-MUSt3R runs successfully on long RGB sequences and produces accurate and consistent 3D reconstruction.
arXiv Detail & Related papers (2026-02-04T13:07:14Z)
LARM: A Large Articulated-Object Reconstruction Model [29.66486888001511]
LARM is a unified feedforward framework that reconstructs 3D articulated objects from sparse-view images.<n>LARM generates auxiliary outputs such as depth maps and part masks to facilitate explicit 3D mesh extraction and joint estimation.<n>Our pipeline eliminates the need for dense supervision and supports high-fidelity reconstruction across diverse object categories.
arXiv Detail & Related papers (2025-11-14T18:55:27Z)
GRMM: Real-Time High-Fidelity Gaussian Morphable Head Model with Learned Residuals [78.67749748078813]
3D Morphable Models (3DMMs) enable controllable facial geometry and expression editing for reconstruction, animation, and AR/VR.<n>We introduce GRMM, the first full-head Gaussian 3D morphable model that augments a base 3DMM with residual geometry and appearance components.<n> GRMM surpasses state-of-the-art methods in fidelity and expression accuracy while delivering interactive real-time performance.
arXiv Detail & Related papers (2025-09-02T09:43:47Z)
Sparse-View 3D Reconstruction: Recent Advances and Open Challenges [0.8583178253811411]
Sparse-view 3D reconstruction is essential for applications in which dense image acquisition is impractical.<n>This survey reviews the latest advances in neural implicit models and explicit point-cloud-based approaches.<n>We analyze how geometric regularization, explicit shape modeling, and generative inference are used to mitigate artifacts.
arXiv Detail & Related papers (2025-07-22T09:57:28Z)
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields [23.174562444342286]
We present Large Inverse Rendering Model (LIRM), a transformer architecture that jointly reconstructs high-quality shape, materials, and radiance fields.<n>Our model builds upon the recent Large Reconstruction Models (LRMs) that achieve state-of-the-art sparse-view reconstruction quality.
arXiv Detail & Related papers (2025-04-28T17:48:58Z)
DiMeR: Disentangled Mesh Reconstruction Model [29.827345186012558]
DiMeR is a novel geometry-texture disentangled feed-forward model with 3D supervision for sparse-view mesh reconstruction.<n>We streamline the algorithm of mesh extraction by eliminating modules with low performance/cost ratios and redesigning regularization losses with 3D supervision.<n>Extensive experiments demonstrate that DiMeR generalises across sparse-view-, single-image-, and text-to-3D tasks, consistently outperforming baselines.
arXiv Detail & Related papers (2025-04-24T15:39:20Z)
RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars [30.56664313203195]
We present Reduced Gaussian Blendshapes Avatar (RGBAvatar), a method for reconstructing, animatable head avatars at speeds sufficient for on-the-fly reconstruction.<n>Our method maps tracked 3DMM parameters into reduced blendshape weights with an composition, leading to a compact set of blendshape bases.<n>We propose a local-global sampling strategy that enables direct on-the-fly reconstruction, immediately reconstructing the images as video streams in real time while achieving quality comparable to offline settings.
arXiv Detail & Related papers (2025-03-17T07:31:21Z)
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z)
Multi-View Large Reconstruction Model via Geometry-Aware Positional Encoding and Attention [54.66152436050373]
We propose a Multi-view Large Reconstruction Model (M-LRM) to reconstruct high-quality 3D shapes from multi-views in a 3D-aware manner.<n>Specifically, we introduce a multi-view consistent cross-attention scheme to enable M-LRM to accurately query information from the input images.<n>Compared to previous methods, the proposed M-LRM can generate 3D shapes of high fidelity.
arXiv Detail & Related papers (2024-06-11T18:29:13Z)
MeshLRM: Large Reconstruction Model for High-Quality Meshes [52.71164862539288]
MeshLRM can reconstruct a high-quality mesh from merely four input images in less than one second.<n>Our approach achieves state-of-the-art mesh reconstruction from sparse-view inputs and also allows for many downstream applications.
arXiv Detail & Related papers (2024-04-18T17:59:41Z)
3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop [128.07841893637337]
Regression-based methods have recently shown promising results in reconstructing human meshes from monocular images. Minor deviation in parameters may lead to noticeable misalignment between the estimated meshes and image evidences. We propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop to leverage a feature pyramid and rectify the predicted parameters.
arXiv Detail & Related papers (2021-03-30T17:07:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.