JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction
- URL: http://arxiv.org/abs/2602.04317v1
- Date: Wed, 04 Feb 2026 08:33:51 GMT
- Title: JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction
- Authors: Zihan Lou, Jinlong Fan, Sihan Ma, Yuxiang Yang, Jing Zhang,
- Abstract summary: We present JOintGS, a unified framework that jointly optimize camera extrinsics, human poses, and 3D Gaussian representations.<n>Experiments on NeuMan and EMDB datasets demonstrate that JOintGS achieves superior reconstruction quality.
- Score: 18.636227266388218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing high-fidelity animatable 3D human avatars from monocular RGB videos remains challenging, particularly in unconstrained in-the-wild scenarios where camera parameters and human poses from off-the-shelf methods (e.g., COLMAP, HMR2.0) are often inaccurate. Splatting (3DGS) advances demonstrate impressive rendering quality and real-time performance, they critically depend on precise camera calibration and pose annotations, limiting their applicability in real-world settings. We present JOintGS, a unified framework that jointly optimizes camera extrinsics, human poses, and 3D Gaussian representations from coarse initialization through a synergistic refinement mechanism. Our key insight is that explicit foreground-background disentanglement enables mutual reinforcement: static background Gaussians anchor camera estimation via multi-view consistency; refined cameras improve human body alignment through accurate temporal correspondence; optimized human poses enhance scene reconstruction by removing dynamic artifacts from static constraints. We further introduce a temporal dynamics module to capture fine-grained pose-dependent deformations and a residual color field to model illumination variations. Extensive experiments on NeuMan and EMDB datasets demonstrate that JOintGS achieves superior reconstruction quality, with 2.1~dB PSNR improvement over state-of-the-art methods on NeuMan dataset, while maintaining real-time rendering. Notably, our method shows significantly enhanced robustness to noisy initialization compared to the baseline.Our source code is available at https://github.com/MiliLab/JOintGS.
Related papers
- WildGHand: Learning Anti-Perturbation Gaussian Hand Avatars from Monocular In-the-Wild Videos [68.43355277637882]
We introduce WildGHand, an optimization-based framework that enables self-adaptive 3D Gaussian splatting on in-the-wild videos.<n>We further curate a dataset of monocular hand videos captured under diverse perturbations to benchmark in-the-wild hand avatar reconstruction.
arXiv Detail & Related papers (2026-02-24T05:14:05Z) - RobustSplat++: Decoupling Densification, Dynamics, and Illumination for In-the-Wild 3DGS [85.90134051583368]
3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling.<n>Existing methods struggle with accurately modeling in-the-wild scenes affected by transient objects and illuminations.<n>We propose RobustSplat++, a robust solution based on several critical designs.
arXiv Detail & Related papers (2025-12-04T14:05:09Z) - UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring [31.35713139629235]
Reconstructing 3D scenes from monocular video often fails due to severe motion blur caused by camera and object motion.<n>We introduce a unified optimization framework by incorporating camera poses as learnable parameters.<n>Our method achieves significant gains in reconstruction quality and pose estimation accuracy over prior dynamic deblurring methods.
arXiv Detail & Related papers (2025-08-31T13:01:03Z) - Enhancing Novel View Synthesis from extremely sparse views with SfM-free 3D Gaussian Splatting Framework [14.927184256861807]
We propose a novel SfM-free 3DGS-based method that jointly estimates camera poses and reconstructs 3D scenes from extremely sparse-view inputs.<n>Our method significantly outperforms other state-of-the-art 3DGS-based approaches, achieving a remarkable 2.75dB improvement in PSNR under extremely sparse-view conditions.
arXiv Detail & Related papers (2025-08-21T11:25:24Z) - GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering [54.489285024494855]
Video stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent.<n>Existing approaches, depending on the domain they operate, suffer from several issues that degrade the user experience.<n>We introduce textbfGaVS, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent local reconstruction and rendering' paradigm.
arXiv Detail & Related papers (2025-06-30T15:24:27Z) - DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z) - FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion [25.54868552979793]
We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data.
Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods.
arXiv Detail & Related papers (2024-03-20T06:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.