Fugu-MT 論文翻訳(概要): Hybrid Offline-Online Reinforcement Learning for Sensorless, High-Precision Force Regulation in Surgical Robotic Grasping

論文の概要: Hybrid Offline-Online Reinforcement Learning for Sensorless, High-Precision Force Regulation in Surgical Robotic Grasping

arxiv url: http://arxiv.org/abs/2602.23870v1
Date: Fri, 27 Feb 2026 10:11:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:41.761807
Title: Hybrid Offline-Online Reinforcement Learning for Sensorless, High-Precision Force Regulation in Surgical Robotic Grasping
Title（参考訳）: 外科用ロボットグラスピングにおけるセンサレス高精度力制御のためのハイブリッドオフライン強化学習
Authors: Edoardo Fazzari, Omar Mohamed, Khalfan Hableel, Hamdan Alhadhrami, Cesare Stefanini,
Abstract要約: 物理一貫性モデリングとハイブリッド強化学習を組み合わせたセンサレス制御フレームワークを提案する。本研究では,電気・伝達・顎運動の結合を捉えるダ・ヴィンチ・シグルーピング機構の第1原理のディジタル双極子を開発した。シミュレーションでは、マルチハーモニック顎運動において、コントローラは所望の基準の1%以内の把持力を維持できる。
参考スコア（独自算出の注目度）: 2.874057693956189
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Precise grasp force regulation in tendon-driven surgical instruments is fundamentally limited by nonlinear coupling between motor dynamics, transmission compliance, friction, and distal mechanics. Existing solutions typically rely on distal force sensing or analytical compensation, increasing hardware complexity or degrading performance under dynamic motion. We present a sensorless control framework that combines physics-consistent modeling and hybrid reinforcement learning to achieve high-precision distal force regulation in a proximally actuated surgical end-effector. We develop a first-principles digital twin of the da Vinci Xi grasping mechanism that captures coupled electrical, transmission, and jaw dynamics within a unified differential-algebraic formulation. To safely learn control policies in this stiff and highly nonlinear system, we introduce a three-stage pipeline:(i)a receding-horizon CMA-ES oracle that generates dynamically feasible expert trajectories,(ii)fully offline policy learning via Implicit Q-Learning to ensure stable initialization without unsafe exploration, and (iii)online refinement using TD3 for adaptation to on-policy dynamics. The resulting policy directly maps proximal measurements to motor voltages and requires no distal sensing. In simulation, the controller maintains grasp force within 1% of the desired reference during multi-harmonic jaw motion. Hardware experiments demonstrate average force errors below 4% across diverse trajectories, validating sim-to-real transfer. The learned policy contains approximately 71k param and executes at kH rates, enabling real-time deployment. These results demonstrate that high-fidelity modeling combined with structured offline-online RL can recover precise distal force behavior without additional sensing, offering a scalable and mechanically compatible solution for surgical robotic manipulation.
Abstract（参考訳）: 腱駆動型手術器具の精密握力制御は, 運動力学, 伝達コンプライアンス, 摩擦, 遠位力学の非線形結合によって根本的に制限される。既存のソリューションは通常、遠位力の検知や解析的補償、ハードウェアの複雑さの増加、動的動作による性能低下に依存している。本稿では, 物理モデルとハイブリッド強化学習を組み合わせたセンサレス制御フレームワークを提案し, 近接動作型外科用エンドエフェクタの高精度遠位力制御を実現する。我々は,統合された微分代数的定式化において,電気・伝達・顎運動の結合を捕捉するダ・ヴィンチ・シグルーピング機構の第一原理ディジタル双対を開発する。この厳密で高非線形なシステムで制御ポリシーを安全に学習するために、3段階のパイプラインを導入します。一動的に実現可能な専門家軌道を生成する後退水平CMA-ESオラクル二安全でない調査をせずに安定した初期化を確保するため、インプリシットQ-ラーニングによるオフライン政策学習 (iii)TD3を用いたオンラインリファインメントをオン・ポリケーシズムに適応させる。結果として得られるポリシーは、近位測定を直接モータ電圧にマッピングし、遠位感知を必要としない。シミュレーションでは、マルチハーモニック顎運動において、コントローラは所望の基準の1%以内の把持力を維持できる。ハードウェア実験は、様々な軌道で4%以下の平均力誤差を示し、sim-to-real転送を検証する。学習されたポリシーは、約71kのパラムを含み、kHレートで実行し、リアルタイムデプロイメントを可能にする。これらの結果は、高忠実度モデリングと構造化オフラインRLが組み合わさって、付加的な感覚を伴わずに正確な遠位力挙動を回復できることを示し、手術ロボット操作のためのスケーラブルで機械的に適合したソリューションを提供する。

論文の概要: Hybrid Offline-Online Reinforcement Learning for Sensorless, High-Precision Force Regulation in Surgical Robotic Grasping

関連論文リスト