Fugu-MT 論文翻訳(概要): Learning Surgical Robotic Manipulation with 3D Spatial Priors

論文の概要: Learning Surgical Robotic Manipulation with 3D Spatial Priors

arxiv url: http://arxiv.org/abs/2603.03798v1
Date: Wed, 04 Mar 2026 07:19:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.88306
Title: Learning Surgical Robotic Manipulation with 3D Spatial Priors
Title（参考訳）: 3次元空間的事前操作による手術ロボットマニピュレーションの学習
Authors: Yu Sheng, Lidian Wang, Xiaomeng Chu, Jiajun Deng, Min Cheng, Yanyong Zhang, Bei Hua, Houqiang Li, Jianmin Ji,
Abstract要約: 本稿では,3次元空間認識を持つ手術ロボットを支援する,終端から終端までのバイスモータポリシーであるSSTを紹介した。立体内視鏡画像から頑健な3次元潜伏表現を抽出するために、手術3Dに基づいて強力な幾何学変換器を微調整する。 SSTは、複雑な外科手術における最先端性能と強力な空間一般化を実現する。
参考スコア（独自算出の注目度）: 73.00031539525202
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Achieving 3D spatial awareness is crucial for surgical robotic manipulation, where precise and delicate operations are required. Existing methods either explicitly reconstruct the surgical scene prior to manipulation, or enhance multi-view features by adding wrist-mounted cameras to supplement the default stereo endoscopes. However, both paradigms suffer from notable limitations: the former easily leads to error accumulation and prevents end-to-end optimization due to its multi-stage nature, while the latter is rarely adopted in clinical practice since wrist-mounted cameras can interfere with the motion of surgical robot arms. In this work, we introduce the Spatial Surgical Transformer (SST), an end-to-end visuomotor policy that empowers surgical robots with 3D spatial awareness by directly exploring 3D spatial cues embedded in endoscopic images. First, we build Surgical3D, a large-scale photorealistic dataset containing 30K stereo endoscopic image pairs with accurate 3D geometry, addressing the scarcity of 3D data in surgical scenes. Based on Surgical3D, we finetune a powerful geometric transformer to extract robust 3D latent representations from stereo endoscopes images. These representations are then seamlessly aligned with the robot's action space via a lightweight multi-level spatial feature connector (MSFC), all within an endoscope-centric coordinate frame. Extensive real-robot experiments demonstrate that SST achieves state-of-the-art performance and strong spatial generalization on complex surgical tasks such as knot tying and ex-vivo organ dissection, representing a significant step toward practical clinical deployment. The dataset and code will be released.
Abstract（参考訳）: 精密かつ繊細な操作が必要な手術ロボット操作において、3D空間認識の達成は不可欠である。既存の方法では、手術前に手術シーンを明示的に再構築するか、デフォルトのステレオ内視鏡を補うために手首に取り付けられたカメラを追加することでマルチビュー機能を強化する。しかし、どちらのパラダイムも顕著な限界に悩まされている: 前者は容易にエラーの蓄積を招き、多段階的な性質のためエンドツーエンドの最適化を防ぐが、後者は手首搭載カメラが手術用ロボットアームの動作を妨げるため、臨床実践で採用されることは稀である。本研究では, 内視鏡画像に埋め込まれた3次元空間的手がかりを直接探索することにより, 外科用ロボットを3次元空間的認識で支援する, エンドツーエンドのバイスモータである空間的手術変換器(SST)を導入する。まず,手術シーンにおける3Dデータの不足に対処するため,30Kの立体内視鏡画像対と正確な3D形状を含む大規模フォトリアリスティックデータセットであるStagement3Dを構築した。立体内視鏡画像から頑健な3次元潜伏表現を抽出するために、手術3Dに基づいて強力な幾何学変換器を微調整する。これらの表現は、軽量なマルチレベル空間特徴コネクタ(MSFC)を介して、ロボットの行動空間とシームレスに一致し、すべて内視鏡中心の座標フレーム内に配置される。広汎な実ロボット実験により、SSTは、結び目や前生臓器郭清などの複雑な外科的作業において、最先端のパフォーマンスと強力な空間的一般化を達成することが示され、実際的な臨床展開に向けた重要なステップである。データセットとコードがリリースされる。

論文の概要: Learning Surgical Robotic Manipulation with 3D Spatial Priors

関連論文リスト