Related papers: MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation

MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation

URL: http://arxiv.org/abs/2512.07165v1
Date: Mon, 08 Dec 2025 04:56:46 GMT
Title: MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation
Authors: Muyu Xu, Fangneng Zhan, Xiaoqin Zhang, Ling Shao, Shijian Lu,
Abstract summary: MuSASplat is a novel framework that dramatically reduces the computational burden of training pose-free feed-forward 3D Gaussian splats models.<n>Central to our approach is a lightweight Multi-Scale Adapter that enables efficient fine-tuning of ViT-based architectures with only a small fraction of training parameters.
Score: 92.57609195819647
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Sparse-view 3D Gaussian splatting seeks to render high-quality novel views of 3D scenes from a limited set of input images. While recent pose-free feed-forward methods leveraging pre-trained 3D priors have achieved impressive results, most of them rely on full fine-tuning of large Vision Transformer (ViT) backbones and incur substantial GPU costs. In this work, we introduce MuSASplat, a novel framework that dramatically reduces the computational burden of training pose-free feed-forward 3D Gaussian splats models with little compromise of rendering quality. Central to our approach is a lightweight Multi-Scale Adapter that enables efficient fine-tuning of ViT-based architectures with only a small fraction of training parameters. This design avoids the prohibitive GPU overhead associated with previous full-model adaptation techniques while maintaining high fidelity in novel view synthesis, even with very sparse input views. In addition, we introduce a Feature Fusion Aggregator that integrates features across input views effectively and efficiently. Unlike widely adopted memory banks, the Feature Fusion Aggregator ensures consistent geometric integration across input views and meanwhile mitigates the memory usage, training complexity, and computational costs significantly. Extensive experiments across diverse datasets show that MuSASplat achieves state-of-the-art rendering quality but has significantly reduced parameters and training resource requirements as compared with existing methods.

Related papers

C3G: Learning Compact 3D Representations with 2K Gaussians [55.04010158339562]
Recent approaches use per-pixel 3D Gaussian Splatting for reconstruction, followed by a 2D-to-3D feature lifting stage for scene understanding.<n>We propose C3G, a novel feed-forward framework that estimates compact 3D Gaussians only at essential spatial locations.
arXiv Detail & Related papers (2025-12-03T17:59:05Z)
MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction [32.14335364083271]
We present Multi-Baseline Gaussian Splatting (MuGS), a feed-forward approach for novel view synthesis.<n>MuGS effectively handles diverse baseline settings, including sparse input views with both small and large baselines.<n>We demonstrate promising zero-shot performance on the LLFF and Mip-NeRF 360 datasets.
arXiv Detail & Related papers (2025-08-06T10:34:24Z)
No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views [17.221166075016257]
SPFSplat is an efficient framework for 3D Gaussian splatting from sparse multi-view images.<n>It employs a shared feature extraction backbone, enabling simultaneous prediction of 3D Gaussian primitives and camera poses.<n>It achieves state-of-the-art performance in novel view synthesis even under significant viewpoint changes and limited image overlap.
arXiv Detail & Related papers (2025-08-02T03:19:13Z)
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z)
SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors [22.561786156613525]
We propose SparseGS-W, a novel framework to Synthesizing novel views of large-scale scenes from unconstrained in-the-wild images.<n>We leverage geometric priors and constrained diffusion priors to compensate for the lack of multi-view information from extremely sparse input.<n>SparseGS-W achieves state-of-the-art performance not only in full-reference metrics, but also in commonly used non-reference metrics such as FID, ClipIQA, and MUSIQ.
arXiv Detail & Related papers (2025-03-25T08:40:40Z)
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z)
MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields [100.90743697473232]
Radiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering.<n>Existing methods often incorporate depth priors from dense estimation networks but overlook the inherent multi-view consistency in input images.<n>We propose a view synthesis framework based on 3D Gaussian Splatting, enabling scene reconstruction from sparse views.
arXiv Detail & Related papers (2024-10-15T08:39:05Z)
WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections [8.261637198675151]
Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. We propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.
arXiv Detail & Related papers (2024-06-04T15:17:37Z)
FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes [50.534213038479926]
FreeSplat is capable of reconstructing geometrically consistent 3D scenes from long sequence input towards free-view synthesis. We propose a simple but effective free-view training strategy that ensures robust view synthesis across broader view range regardless of the number of views.
arXiv Detail & Related papers (2024-05-28T08:40:14Z)
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting [68.94594215660473]
We propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS) We exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms. Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality.
arXiv Detail & Related papers (2024-04-15T04:50:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.