Fugu-MT 論文翻訳(概要): KV-Tracker: Real-Time Pose Tracking with Transformers

論文の概要: KV-Tracker: Real-Time Pose Tracking with Transformers

arxiv url: http://arxiv.org/abs/2512.22581v1
Date: Sat, 27 Dec 2025 13:02:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-30 22:37:30.121687
Title: KV-Tracker: Real-Time Pose Tracking with Transformers
Title（参考訳）: KV-Tracker: トランスフォーマーを用いたリアルタイムポストラッキング
Authors: Marwan Taher, Ignacio Alzugaray, Kirill Mazur, Xin Kong, Andrew J. Davison,
Abstract要約: マルチビュー3D幾何ネットワークは強力だが、リアルタイムアプリケーションでは極めて遅い。モノラルなRGBビデオからオブジェクトやシーンをリアルタイムに6-DoFのポーズトラッキングとオンライン再構築を可能にする新しい方法を提案する。
参考スコア（独自算出の注目度）: 30.32327636560028
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-view 3D geometry networks offer a powerful prior but are prohibitively slow for real-time applications. We propose a novel way to adapt them for online use, enabling real-time 6-DoF pose tracking and online reconstruction of objects and scenes from monocular RGB videos. Our method rapidly selects and manages a set of images as keyframes to map a scene or object via $π^3$ with full bidirectional attention. We then cache the global self-attention block's key-value (KV) pairs and use them as the sole scene representation for online tracking. This allows for up to $15\times$ speedup during inference without the fear of drift or catastrophic forgetting. Our caching strategy is model-agnostic and can be applied to other off-the-shelf multi-view networks without retraining. We demonstrate KV-Tracker on both scene-level tracking and the more challenging task of on-the-fly object tracking and reconstruction without depth measurements or object priors. Experiments on the TUM RGB-D, 7-Scenes, Arctic and OnePose datasets show the strong performance of our system while maintaining high frame-rates up to ${\sim}27$ FPS.
Abstract（参考訳）: マルチビュー3D幾何ネットワークは強力だが、リアルタイムアプリケーションでは極めて遅い。モノラルなRGBビデオからオブジェクトやシーンをリアルタイムに6-DoFのポーズトラッキングとオンライン再構築を可能にする新しい方法を提案する。本手法では,画像の集合をキーフレームとして迅速に選択・管理し,π^3$でシーンやオブジェクトのマッピングを行う。次に、グローバルな自己注意ブロックのキー値(KV)ペアをキャッシュし、オンライントラッキングのための唯一のシーン表現として使用します。これにより、ドリフトや破滅的な忘れ物を恐れることなく、推論中に最大15\times$のスピードアップが可能になる。我々のキャッシュ戦略はモデルに依存しず、再トレーニングなしに他の市販のマルチビューネットワークに適用できる。我々は、KV-Trackerをシーンレベルのトラッキングと、深度測定や被写体先行を使わずに、オンザフライでの物体追跡と再構成のより困難なタスクの両方で実証する。 TUM RGB-D, 7-Scenes, Arctic, OnePoseのデータセットによる実験では,高いフレームレートを最大${\sim}27$ FPSで維持しながら,システムの性能が向上した。

論文の概要: KV-Tracker: Real-Time Pose Tracking with Transformers

関連論文リスト