Splat-SAP: Feed-Forward Gaussian Splatting for Human-Centered Scene with Scale-Aware Point Map Reconstruction
- URL: http://arxiv.org/abs/2511.22704v1
- Date: Thu, 27 Nov 2025 18:58:54 GMT
- Title: Splat-SAP: Feed-Forward Gaussian Splatting for Human-Centered Scene with Scale-Aware Point Map Reconstruction
- Authors: Boyao Zhou, Shunyuan Zheng, Zhanfeng Liao, Zihan Ma, Hanzhang Tu, Boning Liu, Yebin Liu,
- Abstract summary: We present Splat-SAP, a feed-forward approach to render views of human-centered scenes from binocular cameras with large sparsity.<n>We leverage pixel-wise point map reconstruction to represent geometry which is robust to large sparsity for its independent view modeling.
- Score: 39.835146541795986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Splat-SAP, a feed-forward approach to render novel views of human-centered scenes from binocular cameras with large sparsity. Gaussian Splatting has shown its promising potential in rendering tasks, but it typically necessitates per-scene optimization with dense input views. Although some recent approaches achieve feed-forward Gaussian Splatting rendering through geometry priors obtained by multi-view stereo, such approaches still require largely overlapped input views to establish the geometry prior. To bridge this gap, we leverage pixel-wise point map reconstruction to represent geometry which is robust to large sparsity for its independent view modeling. In general, we propose a two-stage learning strategy. In stage 1, we transform the point map into real space via an iterative affinity learning process, which facilitates camera control in the following. In stage 2, we project point maps of two input views onto the target view plane and refine such geometry via stereo matching. Furthermore, we anchor Gaussian primitives on this refined plane in order to render high-quality images. As a metric representation, the scale-aware point map in stage 1 is trained in a self-supervised manner without 3D supervision and stage 2 is supervised with photo-metric loss. We collect multi-view human-centered data and demonstrate that our method improves both the stability of point map reconstruction and the visual quality of free-viewpoint rendering.
Related papers
- SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views [18.814209805277503]
SPFSplatV2, an efficient feed-forward framework for 3D Gaussian splatting from sparse multi-view images, is presented.<n>Method achieves state-of-the-art performance in both in-domain and out-of-domain novel view synthesis.
arXiv Detail & Related papers (2025-09-21T21:37:56Z) - PointGS: Point Attention-Aware Sparse View Synthesis with Gaussian Splatting [4.451779041553596]
3D Gaussian splatting (3DGS) is an innovative rendering technique that surpasses the neural radiance field (NeRF) in both rendering speed and visual quality.<n>We propose a Point-wise Feature-Aware Gaussian Splatting framework that enables real-time, high-quality rendering from sparse training views.
arXiv Detail & Related papers (2025-06-12T04:07:07Z) - PanopticSplatting: End-to-End Panoptic Gaussian Splatting [20.04251473153725]
We propose PanopticSplatting, an end-to-end system for open-vocabulary panoptic reconstruction.<n>Our method introduces query-guided Gaussian segmentation with local cross attention, lifting 2D instance masks without cross-frame association.<n>Our method demonstrates strong performances in 3D scene panoptic reconstruction on the ScanNet-V2 and ScanNet++ datasets.
arXiv Detail & Related papers (2025-03-23T13:45:39Z) - FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [100.45129752375658]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images.<n>Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z) - CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image [18.445769892372528]
We introduce CATSplat, a novel generalizable transformer-based framework for single-view 3D scene reconstruction.<n>By incorporating scene-specific contextual details from text embeddings through cross-attention, we pave the way for context-aware reconstruction.<n>Experiments on large-scale datasets demonstrate the state-of-the-art performance of CATSplat in single-view 3D scene reconstruction.
arXiv Detail & Related papers (2024-12-17T13:32:04Z) - GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting.
We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space.
Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z) - FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes [50.534213038479926]
FreeSplat is capable of reconstructing geometrically consistent 3D scenes from long sequence input towards free-view synthesis.
We propose a simple but effective free-view training strategy that ensures robust view synthesis across broader view range regardless of the number of views.
arXiv Detail & Related papers (2024-05-28T08:40:14Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.