Fugu-MT 論文翻訳(概要): VFM-Recon: Unlocking Cross-Domain Scene-Level Neural Reconstruction with Scale-Aligned Foundation Priors

論文の概要: VFM-Recon: Unlocking Cross-Domain Scene-Level Neural Reconstruction with Scale-Aligned Foundation Priors

arxiv url: http://arxiv.org/abs/2603.12657v1
Date: Fri, 13 Mar 2026 05:00:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:11.91051
Title: VFM-Recon: Unlocking Cross-Domain Scene-Level Neural Reconstruction with Scale-Aligned Foundation Priors
Title（参考訳）: VFM-Recon: スケールアライン・ファウンデーションを前倒ししたクロスドメイン・シーン・レベルニューラルリコン
Authors: Yuhang Ming, Tingkang Xi, Xingrui Yang, Lixin Yang, Yong Peng, Cewu Lu, Wanzeng Kong,
Abstract要約: VFMReconは, シーンレベルの神経再構成において, スケール一貫性の要求を満たすトランスファー可能なVFMプリエントをブリッジする最初の試みである。具体的には、まず、マルチビュースケールコヒーレンスを復元する軽量なスケールアライメントステージを導入する。次に、トレーニング済みのVFM機能を、軽量なタスク固有アダプタを介して、ニューラルボリューム再構築パイプラインに統合する。
参考スコア（独自算出の注目度）: 49.39553550491549
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scene-level neural volumetric reconstruction from monocular videos remains challenging, especially under severe domain shifts. Although recent advances in vision foundation models (VFMs) provide transferable generalized priors learned from large-scale data, their scaleambiguous predictions are incompatible with the scale consistency required by volumetric fusion. To address this gap, we present VFMRecon, the first attempt to bridge transferable VFM priors with scaleconsistent requirements in scene-level neural reconstruction. Specifically, we first introduce a lightweight scale alignment stage that restores multiview scale coherence. We then integrate pretrained VFM features into the neural volumetric reconstruction pipeline via lightweight task-specific adapters, which are trained for reconstruction while preserving the crossdomain robustness of pretrained representations. We train our model on ScanNet train split and evaluate on both in-distribution ScanNet test split and out-of-distribution TUM RGB-D and Tanks and Temples datasets. The results demonstrate that our model achieves state-of-theart performance across all datasets domains. In particular, on the challenging outdoor Tanks and Temples dataset, our model achieves an F1 score of 70.1 in reconstructed mesh evaluation, substantially outperforming the closest competitor, VGGT, which only attains 51.8.
Abstract（参考訳）: モノキュラービデオからのシーンレベルの神経ボリューム再構成は、特に激しいドメインシフトの下では困難である。近年の視覚基礎モデル(VFM)は、大規模データから得られた伝達可能な一般化された事前情報を提供するが、そのスケールのあいまいな予測は、体積融合で要求されるスケールの一貫性とは相容れない。このギャップに対処するために、我々は、シーンレベルのニューラル再構築においてスケール一貫性のある要求で、転送可能なVFMプリエントを橋渡しする最初の試みであるVFMReconを紹介する。具体的には、まず、マルチビュースケールコヒーレンスを復元する軽量なスケールアライメントステージを導入する。次に、トレーニング済みのVFM機能を、トレーニング済み表現のクロスドメインロバスト性を維持しつつ、再構築のためにトレーニングされた軽量タスク固有アダプタを介して、ニューラルボリューム再構築パイプラインに統合する。我々は、ScanNetの列車分割をトレーニングし、ScanNetテスト分割とTUM RGB-DとTurps and Templesデータセットの両方で評価する。その結果,本モデルがすべてのデータセット領域における最先端性能を実現することを示す。特に、挑戦的な屋外タンクとテンプルのデータセットでは、再構成メッシュ評価においてF1スコアが70.1に達し、最も近い競合であるVGGTよりもかなり優れています。

論文の概要: VFM-Recon: Unlocking Cross-Domain Scene-Level Neural Reconstruction with Scale-Aligned Foundation Priors

関連論文リスト