Fugu-MT 論文翻訳(概要): Training-free Detection and 6D Pose Estimation of Unseen Surgical Instruments

論文の概要: Training-free Detection and 6D Pose Estimation of Unseen Surgical Instruments

arxiv url: http://arxiv.org/abs/2603.25228v1
Date: Thu, 26 Mar 2026 09:28:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:48.215831
Title: Training-free Detection and 6D Pose Estimation of Unseen Surgical Instruments
Title（参考訳）: 目に見えない手術器具の無トレーニング検出と6次元ポス推定
Authors: Jonas Hein, Lilian Calvet, Matthias Seibold, Siyu Tang, Marc Pollefeys, Philipp Fürnstahl,
Abstract要約: 本研究は,未確認の手術器具の高精度なマルチビュー6Dポーズ推定のための無トレーニングパイプラインを導入する。本手法はMVPSPデータセットから実世界の手術データを用いて厳密に評価した。
参考スコア（独自算出の注目度）: 45.387920812487465
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Purpose: Accurate detection and 6D pose estimation of surgical instruments are crucial for many computer-assisted interventions. However, supervised methods lack flexibility for new or unseen tools and require extensive annotated data. This work introduces a training-free pipeline for accurate multi-view 6D pose estimation of unseen surgical instruments, which only requires a textured CAD model as prior knowledge. Methods: Our pipeline consists of two main stages. First, for detection, we generate object mask proposals in each view and score their similarity to rendered templates using a pre-trained feature extractor. Detections are matched across views, triangulated into 3D instance candidates, and filtered using multi-view geometric consistency. Second, for pose estimation, a set of pose hypotheses is iteratively refined and scored using feature-metric scores with cross-view attention. The best hypothesis undergoes a final refinement using a novel multi-view, occlusion-aware contour registration, which minimizes reprojection errors of unoccluded contour points. Results: The proposed method was rigorously evaluated on real-world surgical data from the MVPSP dataset. The method achieves millimeter-accurate pose estimates that are on par with supervised methods under controlled conditions, while maintaining full generalization to unseen instruments. These results demonstrate the feasibility of training-free, marker-less detection and tracking in surgical scenes, and highlight the unique challenges in surgical environments. Conclusion: We present a novel and flexible pipeline that effectively combines state-of-the-art foundational models, multi-view geometry, and contour-based refinement for high-accuracy 6D pose estimation of surgical instruments without task-specific training. This approach enables robust instrument tracking and scene understanding in dynamic clinical environments.
Abstract（参考訳）: 目的: 手術器具の正確な検出と6次元ポーズ推定は, 多くのコンピュータ支援手術において重要である。しかし、教師付きメソッドは、新しいツールや見えないツールの柔軟性に欠けており、広範な注釈付きデータを必要とする。本研究は, テクスチャCADモデルのみを事前知識として必要とする, 未確認手術器具の高精度なマルチビュー6Dポーズ推定のためのトレーニングフリーパイプラインを提案する。方法:私たちのパイプラインは2つの主要なステージで構成されています。まず,各ビューにオブジェクトマスクの提案を生成し,事前学習した特徴抽出器を用いてテンプレートと類似性を評価する。検出はビュー間で一致し、3Dインスタンス候補にトリゲートされ、マルチビューの幾何学的一貫性を使ってフィルタリングされる。第2に、ポーズ推定のために、一組のポーズ仮説を反復的に洗練し、クロスビューアテンションを備えた特徴量スコアを用いてスコアする。最良の仮説は、包含されていない輪郭点の再射誤差を最小限に抑える、新しい多視点オクルージョン対応輪郭登録を用いて最終改良を行う。結果: MVPSPデータセットを用いた実世界の手術データから, 本手法を厳格に評価した。制御条件下での教師付き手法と同等のミリ精度のポーズ推定を達成し、未知の機器への完全な一般化を維持できる。これらの結果は, 手術現場における無トレーニング, 無マーカー検出, 追跡の実現可能性を示し, 手術環境における特異な課題を浮き彫りにした。結論: タスク固有の訓練を伴わない手術器具の高精度6Dポーズ推定のために, 最先端の基礎モデル, マルチビュー幾何, 輪郭をベースとした改良を効果的に組み合わせた, 新規で柔軟なパイプラインを提案する。このアプローチにより、ダイナミックな臨床環境におけるロバストな計器追跡とシーン理解が可能になる。

論文の概要: Training-free Detection and 6D Pose Estimation of Unseen Surgical Instruments

関連論文リスト