Fugu-MT 論文翻訳(概要): VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction

論文の概要: VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction

arxiv url: http://arxiv.org/abs/2510.19578v1
Date: Wed, 22 Oct 2025 13:28:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:15.82867
Title: VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction
Title（参考訳）: VGD: フィードフォワード・サラウンド・ドライビング・コンストラクションのための視覚幾何学的ガウス・スプレイティング
Authors: Junhong Lin, Kangli Wang, Shunzhou Wang, Songlin Fan, Ge Li, Wei Gao,
Abstract要約: 我々は,この課題に対処すべく,新しいフィードフォワードエンドツーエンド学習フレームワークであるtextbfVisual Gaussian Driving (VGD)を紹介した。提案手法は, 客観的指標と主観的品質の両方において, 種々の条件下で, 最先端の手法を著しく上回ることを示す。
参考スコア（独自算出の注目度）: 26.668204454537246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feed-forward surround-view autonomous driving scene reconstruction offers fast, generalizable inference ability, which faces the core challenge of ensuring generalization while elevating novel view quality. Due to the surround-view with minimal overlap regions, existing methods typically fail to ensure geometric consistency and reconstruction quality for novel views. To tackle this tension, we claim that geometric information must be learned explicitly, and the resulting features should be leveraged to guide the elevating of semantic quality in novel views. In this paper, we introduce \textbf{Visual Gaussian Driving (VGD)}, a novel feed-forward end-to-end learning framework designed to address this challenge. To achieve generalizable geometric estimation, we design a lightweight variant of the VGGT architecture to efficiently distill its geometric priors from the pre-trained VGGT to the geometry branch. Furthermore, we design a Gaussian Head that fuses multi-scale geometry tokens to predict Gaussian parameters for novel view rendering, which shares the same patch backbone as the geometry branch. Finally, we integrate multi-scale features from both geometry and Gaussian head branches to jointly supervise a semantic refinement model, optimizing rendering quality through feature-consistent learning. Experiments on nuScenes demonstrate that our approach significantly outperforms state-of-the-art methods in both objective metrics and subjective quality under various settings, which validates VGD's scalability and high-fidelity surround-view reconstruction.
Abstract（参考訳）: フィードフォワード・サラウンドビュー自律走行シーンの再構築は、高速で一般化可能な推論能力を提供する。最小の重複領域を持つサラウンドビューのため、既存の手法は通常、新しいビューの幾何的一貫性と復元品質を確保するのに失敗する。この緊張に対処するためには、幾何学的情報を明示的に学習し、結果として得られる特徴を活用して、新規な視点における意味的品質の上昇を導く必要がある。本稿では,この課題に対処すべく,新しいフィードフォワードエンドツーエンド学習フレームワークである‘textbf{Visual Gaussian Driving(VGD)’を紹介する。一般化可能な幾何推定を実現するため,VGGTアーキテクチャの軽量な変種を設計し,事前学習したVGGTから幾何分岐への幾何先行量を効率的に抽出する。さらに,複数スケールの幾何トークンを融合したガウスヘッドを設計し,新しいビューレンダリングのためのガウスパラメータを予測する。最後に,幾何学系とガウス系の両方のヘッドブランチのマルチスケール機能を統合し,セマンティックリファインメントモデルを共同で監督し,特徴一貫性学習によるレンダリング品質の最適化を行う。 nuScenesの実験では,VGDのスケーラビリティと高忠実度サラウンドビューの再構築を検証し,客観的な計測値と主観的品質の両方で最先端の手法を著しく上回っていることが示された。

論文の概要: VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction

関連論文リスト