Related papers: VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction

VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction

URL: http://arxiv.org/abs/2510.19578v1
Date: Wed, 22 Oct 2025 13:28:49 GMT
Title: VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction
Authors: Junhong Lin, Kangli Wang, Shunzhou Wang, Songlin Fan, Ge Li, Wei Gao,
Abstract summary: We introduce textbfVisual Gaussian Driving (VGD), a novel feed-forward end-to-end learning framework designed to address this challenge.<n>We show that our approach significantly outperforms state-of-the-art methods in both objective metrics and subjective quality under various settings.
Score: 26.668204454537246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feed-forward surround-view autonomous driving scene reconstruction offers fast, generalizable inference ability, which faces the core challenge of ensuring generalization while elevating novel view quality. Due to the surround-view with minimal overlap regions, existing methods typically fail to ensure geometric consistency and reconstruction quality for novel views. To tackle this tension, we claim that geometric information must be learned explicitly, and the resulting features should be leveraged to guide the elevating of semantic quality in novel views. In this paper, we introduce \textbf{Visual Gaussian Driving (VGD)}, a novel feed-forward end-to-end learning framework designed to address this challenge. To achieve generalizable geometric estimation, we design a lightweight variant of the VGGT architecture to efficiently distill its geometric priors from the pre-trained VGGT to the geometry branch. Furthermore, we design a Gaussian Head that fuses multi-scale geometry tokens to predict Gaussian parameters for novel view rendering, which shares the same patch backbone as the geometry branch. Finally, we integrate multi-scale features from both geometry and Gaussian head branches to jointly supervise a semantic refinement model, optimizing rendering quality through feature-consistent learning. Experiments on nuScenes demonstrate that our approach significantly outperforms state-of-the-art methods in both objective metrics and subjective quality under various settings, which validates VGD's scalability and high-fidelity surround-view reconstruction.

Related papers

SparseSurf: Sparse-View 3D Gaussian Splatting for Surface Reconstruction [26.59203606048875]
We propose net, a method that reconstructs more accurate and detailed surfaces while preserving high-quality novel view rendering.<n>Our key insight is to introduce Stereo Geometry-Texture Alignment, which bridges rendering quality and geometry estimation.<n>In addition, we present a Pseudo-Feature Enhanced Geometry Consistency that enforces multi-view geometric consistency.
arXiv Detail & Related papers (2025-11-18T16:24:37Z)
VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment [48.147381011235446]
3D Gaussian Splatting has recently emerged as an efficient solution for real-time novel view synthesis.<n>We propose a novel method that enhances the geometric representation of 3D Gaussians through view alignment.<n>Our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis.
arXiv Detail & Related papers (2025-10-13T14:44:50Z)
Proximal Vision Transformer: Enhancing Feature Representation through Two-Stage Manifold Geometry [7.3623134099785155]
Vision Transformer (ViT) has become widely recognized in computer vision, leveraging its self-attention mechanism to achieve remarkable success across various tasks.<n>This paper proposes a novel framework that integrates ViT with the proximal tools, enabling a unified geometric optimization approach.<n> Experimental results confirm that the proposed method outperforms traditional ViT in terms of classification accuracy and data distribution.
arXiv Detail & Related papers (2025-08-23T16:39:09Z)
Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis [49.67420486373202]
GRGS is a generalizable and relightable 3D Gaussian framework for high-fidelity human novel view synthesis under diverse lighting conditions.<n>We introduce a Lighting-aware Geometry Refinement (LGR) module trained on synthetically relit data to predict accurate depth and surface normals.
arXiv Detail & Related papers (2025-05-27T17:59:47Z)
GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity [49.31257173003408]
We present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video.<n>Our approach demonstrates strong capabilities in recovering high-fidelity object meshes, setting a new standard for single-sensor 3D reconstruction in open-world environments.
arXiv Detail & Related papers (2025-05-17T08:46:29Z)
Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization [27.509109317973817]
3D Gaussian Splatting (3DGS) has garnered significant attention for its high-quality rendering and fast inference speed.<n>Previous methods primarily focus on geometry regularization, with common approaches including primitive-based and dual-model frameworks.<n>We propose CarGS, a unified model leveraging-adaptive regularization to achieve simultaneous, high-quality surface reconstruction.
arXiv Detail & Related papers (2025-03-02T12:51:38Z)
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders [87.17440422575721]
Dora-VAE is a novel approach that enhances VAE reconstruction through our proposed sharp edge sampling strategy and a dual cross-attention mechanism.<n>Dora-VAE achieves comparable reconstruction quality to the state-of-the-art dense XCube-VAE while requiring a latent space at least 8$times$ smaller.
arXiv Detail & Related papers (2024-12-23T18:59:06Z)
MonoGSDF: Exploring Monocular Geometric Cues for Gaussian Splatting-Guided Implicit Surface Reconstruction [86.87464903285208]
We introduce MonoGSDF, a novel method that couples primitives with a neural Signed Distance Field (SDF) for high-quality reconstruction.<n>To handle arbitrary-scale scenes, we propose a scaling strategy for robust generalization.<n>Experiments on real-world datasets outperforms prior methods while maintaining efficiency.
arXiv Detail & Related papers (2024-11-25T20:07:07Z)
NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views [41.03837477483364]
We propose a novel sparse view reconstruction framework that leverages on-surface priors to achieve highly faithful surface reconstruction. Specifically, we design several constraints on global geometry alignment and local geometry refinement for jointly optimizing coarse shapes and fine details. The experimental results with DTU and BlendedMVS datasets in two prevalent sparse settings demonstrate significant improvements over the state-of-the-art methods.
arXiv Detail & Related papers (2023-12-21T16:04:45Z)
GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images. Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z)
DARF: Depth-Aware Generalizable Neural Radiance Field [51.29437249009986]
We propose the Depth-Aware Generalizable Neural Radiance Field (DARF) with a Depth-Aware Dynamic Sampling (DADS) strategy.<n>Our framework infers the unseen scenes on both pixel level and geometry level with only a few input images.<n>Compared with state-of-the-art generalizable NeRF methods, DARF reduces samples by 50%, while improving rendering quality and depth estimation.
arXiv Detail & Related papers (2022-12-05T14:00:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.