Fugu-MT 論文翻訳(概要): TurboVGGT: Fast Visual Geometry Reconstruction with Adaptive Alternating Attention

論文の概要: TurboVGGT: Fast Visual Geometry Reconstruction with Adaptive Alternating Attention

arxiv url: http://arxiv.org/abs/2605.14315v1
Date: Thu, 14 May 2026 03:24:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.601737
Title: TurboVGGT: Fast Visual Geometry Reconstruction with Adaptive Alternating Attention
Title（参考訳）: TurboVGGT:適応的交互注意による高速視線再構成
Authors: David Huang, Guile Wu, Chengjie Huang, Bingbing Liu, Dongfeng Bai,
Abstract要約: TurboVGGTは、高速なマルチビュー3D再構成のために適応的注目を交互に行う効率的な視覚幾何学変換器を採用している。適応的疎大なグローバルな注目の中で、TurboVGGTはグローバル幾何モデリングのための様々な空間レベルを持つ代表トークンを適応的に学習する。複数の3次元再構成ベンチマーク実験により,TurboVGGTは高速な多視点再構成を実現し,競争力のある再構成品質を維持した。
参考スコア（独自算出の注目度）: 21.29668311125256
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent feed-forward 3D reconstruction methods, such as visual geometry transformers, have substantially advanced the traditional per-scene optimization paradigm by enabling effective multi-view reconstruction in a single forward pass. However, most existing methods struggle to achieve a balance between reconstruction quality and computational efficiency, which limits their scalability and efficiency. Although some efficient visual geometry transformers have recently emerged, they typically use the same sparsity ratio across layers and frames and lack mechanisms to adaptively learn representative tokens to capture global relationships, leading to suboptimal performance. In this work, we propose TurboVGGT, a novel approach that employs an efficient visual geometry transformer with adaptive alternating attention for fast multi-view 3D reconstruction. Specifically, TurboVGGT employs an end-to-end trainable framework with adaptive sparse global attention guided by adaptive sparsity selection to capture global relationships across frames and frame attention to aggregate local details within each frame. In the adaptive sparse global attention, TurboVGGT adaptively learns representative tokens with varying sparsity levels for global geometry modeling, considering that token importance varies across frames, attention layers operate tokens at different levels of abstraction, and global dependencies rely on structurally informative regions. Extensive experiments on multiple 3D reconstruction benchmarks demonstrate that TurboVGGT achieves fast multi-view reconstruction while maintaining competitive reconstruction quality compared with state-of-the-art methods. Project page: https://turbovggt.github.io/.
Abstract（参考訳）: 視覚幾何学変換器などの最近のフィードフォワード3次元再構成手法は,単一の前方通過で効果的に多視点再構成を行うことにより,従来のシーンごとの最適化パラダイムを大幅に進歩させてきた。しかし、既存のほとんどの手法は、再構成品質と計算効率のバランスを保ち、スケーラビリティと効率を制限している。一部の効率的な視覚幾何学変換器は近年出現しているが、一般的には層やフレーム間で同じ空間比を用いており、グローバルな関係を捉えるために代表トークンを適応的に学習するメカニズムが欠如しており、最適以下の性能をもたらす。本稿では,高速なマルチビュー3D再構成を実現するために,適応的注目を交互に行う高効率なビジュアル幾何変換器を用いたTurboVGGTを提案する。特にTurboVGTでは,フレーム間のグローバルな関係を捉え,各フレーム内の局所的な詳細を集約するために,適応空間選択によって誘導される,適応的疎大なグローバルアテンションを備えたエンドツーエンドのトレーニング可能なフレームワークを採用している。適応的疎大なグローバルな注目の中で、TurboVGGTは、フレーム毎にトークンの重要度が異なり、注目層が異なる抽象レベルでトークンを運用し、グローバル依存が構造的に情報的領域に依存していることを考慮し、グローバルジオメトリモデリングのための様々な空間レベルの代表トークンを適応的に学習する。複数の3次元再構成ベンチマークにおいて、TurboVGGTは、最先端の手法と比較して、競争力のある再現品質を維持しつつ、高速な多視点再構成を実現することを示した。プロジェクトページ: https://turbovggt.github.io/.com

論文の概要: TurboVGGT: Fast Visual Geometry Reconstruction with Adaptive Alternating Attention

関連論文リスト