Fugu-MT 論文翻訳(概要): LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

論文の概要: LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

arxiv url: http://arxiv.org/abs/2604.05182v1
Date: Mon, 06 Apr 2026 21:21:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.498537
Title: LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows
Title（参考訳）: LSRM:スケールドコンテキストウィンドウによる高忠実度オブジェクト中心再構成
Authors: Zhengqin Li, Cheng Zhang, Jakob Engel, Zhao Dong,
Abstract要約: 本研究では,大規模スパース再構成モデルを導入し,拡張型コンテクストウィンドウがフィードフォワード3D再構成に与える影響について検討する。アクティブなオブジェクトや画像トークンの数を大幅に増やすことで、コンテキストウィンドウを拡大することで、このギャップを著しく狭め、高忠実度な3Dオブジェクト再構成と逆レンダリングを可能にします。
参考スコア（独自算出の注目度）: 10.300202521638274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce the Large Sparse Reconstruction Model to study how scaling transformer context windows impacts feed-forward 3D reconstruction. Although recent object-centric feed-forward methods deliver robust, high-quality reconstruction, they still lag behind dense-view optimization in recovering fine-grained texture and appearance. We show that expanding the context window -- by substantially increasing the number of active object and image tokens -- remarkably narrows this gap and enables high-fidelity 3D object reconstruction and inverse rendering. To scale effectively, we adapt native sparse attention in our architecture design, unlocking its capacity for 3D reconstruction with three key contributions: (1) an efficient coarse-to-fine pipeline that focuses computation on informative regions by predicting sparse high-resolution residuals; (2) a 3D-aware spatial routing mechanism that establishes accurate 2D-3D correspondences using explicit geometric distances rather than standard attention scores; and (3) a custom block-aware sequence parallelism strategy utilizing an All-gather-KV protocol to balance dynamic, sparse workloads across GPUs. As a result, LSRM handles 20x more object tokens and >2x more image tokens than prior state-of-the-art (SOTA) methods. Extensive evaluations on standard novel-view synthesis benchmarks show substantial gains over the current SOTA, yielding 2.5 dB higher PSNR and 40% lower LPIPS. Furthermore, when extending LSRM to inverse rendering tasks, qualitative and quantitative evaluations on widely-used benchmarks demonstrate consistent improvements in texture and geometry details, achieving an LPIPS that matches or exceeds that of SOTA dense-view optimization methods. Code and model will be released on our project page.
Abstract（参考訳）: 本研究では,大規模スパース再構成モデルを導入し,拡張型コンテクストウィンドウがフィードフォワード3D再構成に与える影響について検討する。最近のオブジェクト中心フィードフォワード法は、堅牢で高品質な再構築を実現するが、きめ細かいテクスチャや外観を回復する際には、高精細度ビューの最適化に遅れがある。アクティブなオブジェクトや画像トークンの数を大幅に増やすことで、コンテキストウィンドウを拡大することで、このギャップを著しく狭め、高忠実度な3Dオブジェクト再構成と逆レンダリングを可能にします。アーキテクチャ設計において,本手法を効果的に拡張するために,(1)高分解能残差を予測して情報領域に計算を集中させる効率的な粗大なパイプライン,(2)標準の注目スコアではなく明示的な幾何学的距離を用いて正確な2D-3D対応を確立する3D空間ルーティング機構,(3)GPU間でのダイナミックかつスパースなワークロードのバランスをとるためにAll-gather-KVプロトコルを利用した独自のブロック認識シーケンス戦略,の3つの重要な貢献によって,ネイティブスパークの注意を3次元再構築する。その結果、LSRMは従来のSOTA(State-of-the-art)メソッドよりも20倍のオブジェクトトークンと2倍のイメージトークンを処理している。標準ノベルビュー合成ベンチマークの大規模な評価では、現在のSOTAよりも大幅に向上し、PSNRが2.5dB、LPIPSが40%低下した。さらに、LSRMを逆レンダリングタスクに拡張する場合、広く使用されているベンチマークの質的および定量的評価により、テクスチャと幾何学的詳細が一貫した改善が示され、SOTAの密度ビュー最適化手法と一致するかそれ以上のLPIPSが達成される。コードとモデルはプロジェクトのページでリリースされます。

論文の概要: LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

関連論文リスト