Fugu-MT 論文翻訳(概要): Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

論文の概要: Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

arxiv url: http://arxiv.org/abs/2606.24457v1
Date: Tue, 23 Jun 2026 11:45:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.931124
Title: Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching
Title（参考訳）: Lite Any Stereo V2:より速く、より強力なゼロショットステレオマッチング
Authors: Junpeng Jing, Ronglai Zuo, Zhelun Shen, Shangchen Zhou, Rolandos Alexandros Potamias, Stefanos Zafeiriou, Krystian Mikolajczyk, Jiankang Deng,
Abstract要約: Lite Any Stereo V2は、効率的なゼロショットステレオマッチング用に設計された超高速モデルシリーズである。 LAS2はアーキテクチャとトレーニングの両方の観点から開発されている。 LAS2-Hは、反復的なFast-FoundationStereoよりも、全体的なゼロショットのパフォーマンスが向上する。
参考スコア（独自算出の注目度）: 96.63979376005277
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in stereo matching have achieved remarkable accuracy, but often rely on large models, heavy computation, or additional foundation-model priors, making them difficult to deploy on resource-constrained platforms. In contrast, efficient stereo models offer faster inference but are commonly considered less capable of strong zero-shot generalization. In this paper, we challenge this assumption by introducing Lite Any Stereo V2 (LAS2), an ultra-fast model series designed for efficient zero-shot stereo matching. LAS2 is developed from both architecture and training perspectives. Architecturally, we revisit efficient stereo design under practical deployment settings and propose a 2D-only cost aggregation framework, optimized for real inference latency rather than theoretical MACs alone. For training, we develop a three-stage strategy that combines synthetic supervision, self-distillation, and real-world knowledge distillation. To improve the reliability of real-world pseudo supervision, we further introduce pseudo-label filtering and an error-clamping operation, enabling smoother synthetic-to-real transfer. We instantiate LAS2 as a family of models, including feed-forward variants for different efficiency budgets and an iterative variant for higher accuracy. Extensive experiments show that LAS2 achieves state-of-the-art accuracy among efficient stereo methods while maintaining significantly lower latency. Specifically, LAS2-H achieves stronger overall zero-shot performance than the iterative method Fast-FoundationStereo, with 1.8x and 2.7x faster inference on H200 and Orin, respectively. The project page, demos, and code are available at https://tomtomtommi.github.io/LiteAnyStereoV2/.
Abstract（参考訳）: ステレオマッチングの最近の進歩は目覚ましい精度を達成しているが、しばしば大きなモデル、重い計算、あるいは追加の基礎モデルに頼っているため、リソースに制約のあるプラットフォームへのデプロイが困難である。対照的に、効率的なステレオモデルはより高速な推論を提供するが、強いゼロショット一般化の能力は低いと考えられている。本稿では,効率的なゼロショットステレオマッチングのために設計された超高速モデル系列であるLite Any Stereo V2(LAS2)を導入することで,この仮定に挑戦する。 LAS2はアーキテクチャとトレーニングの両方の観点から開発されている。アーキテクチャ上,実運用環境下での効率的なステレオ設計を再検討し,理論MACのみではなく,実際の推論遅延に最適化された2次元のみのコスト集約フレームワークを提案する。トレーニングには, 総合的な指導, 自己蒸留, 実世界の知識蒸留を組み合わせた3段階の戦略を開発する。実世界の疑似監視の信頼性を向上させるため,擬似ラベルフィルタリングと誤りクランプ処理を導入し,よりスムーズな合成と現実の転送を可能にする。我々はLAS2をモデル群としてインスタンス化し、異なる効率予算のためのフィードフォワード変種とより高精度な反復変種を含む。広範囲な実験により、LAS2は効率のよいステレオメソッド間で最先端の精度を達成し、レイテンシを著しく低く保っていることが示された。具体的には、LAS2-Hは反復法であるFast-FoundationStereoよりも全体的なゼロショット性能が強く、それぞれH200とOrinで1.8倍と2.7倍高速である。プロジェクトページ、デモ、コードはhttps://tomtommmi.github.io/LiteAnyStereoV2/.comで公開されている。

論文の概要: Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching

関連論文リスト