Fugu-MT 論文翻訳(概要): Surg-R1: A Hierarchical Reasoning Foundation Model for Scalable and Interpretable Surgical Decision Support with Multi-Center Clinical Validation

論文の概要: Surg-R1: A Hierarchical Reasoning Foundation Model for Scalable and Interpretable Surgical Decision Support with Multi-Center Clinical Validation

arxiv url: http://arxiv.org/abs/2603.12430v1
Date: Thu, 12 Mar 2026 20:26:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:11.752784
Title: Surg-R1: A Hierarchical Reasoning Foundation Model for Scalable and Interpretable Surgical Decision Support with Multi-Center Clinical Validation
Title（参考訳）: Surg-R1:多施設臨床検証によるスケーラブルかつ解釈可能な外科的決定支援のための階層的推論基盤モデル
Authors: Jian Jiang, Chenxi Lin, Yiming Gu, Zengyi Qin, Zhitao Zeng, Kun Yuan, Yonghao Long, Xiang Xia, Cheng Yuan, Yuqi Wang, Zijie Yue, Kunyi Yang, Yuting Zhang, Zhu Zhuo, Dian Qin, Xin Wang, NG Chi Fai, Brian Anthony, Daguang Xu, Guy Rosman, Ozanan Meireles, Zizhen Zhang, Nicolas Padoy, Hesheng Wang, Qi Dou, Yueming Jin, Yutong Ban,
Abstract要約: Surg-R1は4段階のパイプラインで訓練された階層的推論によってギャップに対処する手術的視覚言語モデルである。提案手法では,(1)知覚的根拠,関係理解,文脈的推論に外科的解釈を分解する3段階の推論階層,(2)320,000の推論ペアを持つ最大の外科的チェーン・オブ・シークエンス・データセット,の3つの重要な貢献を紹介する。
参考スコア（独自算出の注目度）: 51.897472694590356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Surgical scene understanding demands not only accurate predictions but also interpretable reasoning that surgeons can verify against clinical expertise. However, existing surgical vision-language models generate predictions without reasoning chains, and general-purpose reasoning models fail on compositional surgical tasks without domain-specific knowledge. We present Surg-R1, a surgical Vision-Language Model that addresses this gap through hierarchical reasoning trained via a four-stage pipeline. Our approach introduces three key contributions: (1) a three-level reasoning hierarchy decomposing surgical interpretation into perceptual grounding, relational understanding, and contextual reasoning; (2) the largest surgical chain-of-thought dataset with 320,000 reasoning pairs; and (3) a four-stage training pipeline progressing from supervised fine-tuning to group relative policy optimization and iterative self-improvement. Evaluation on SurgBench, comprising six public benchmarks and six multi-center external validation datasets from five institutions, demonstrates that Surg-R1 achieves the highest Arena Score (64.9%) on public benchmarks versus Gemini 3.0 Pro (46.1%) and GPT-5.1 (37.9%), outperforming both proprietary reasoning models and specialized surgical VLMs on the majority of tasks spanning instrument localization, triplet recognition, phase recognition, action recognition, and critical view of safety assessment, with a 15.2 percentage point improvement over the strongest surgical baseline on external validation.
Abstract（参考訳）: 外科的シーン理解は、正確な予測だけでなく、外科医が臨床の専門知識に対して検証できるという解釈可能な理由も要求する。しかし、既存の外科的視覚言語モデルは、推論チェーンなしで予測を生成し、汎用推論モデルは、ドメイン固有の知識のない構成的外科的タスクでは失敗する。 Surg-R1は4段階のパイプラインで訓練された階層的推論によってこのギャップに対処する手術用視覚言語モデルである。提案手法では,3段階の推論階層を知覚的基盤化,関係理解,文脈的推論に分解し,また,320,000対の推論ペアを持つ最大の外科的連鎖データセット,および,教師付き微調整からグループ相対的政策最適化,反復的自己改善に至るまでの4段階の訓練パイプラインを導入している。 6つの公開ベンチマークと5つの機関からの6つのマルチセンター外部検証データセットからなるSurgBenchの評価は、Surg-R1が、Gemini 3.0 Pro (46.1%) と GPT-5.1 (37.9%) に対して、最も高いアリーナスコア(64.9%)を達成していることを示している。

論文の概要: Surg-R1: A Hierarchical Reasoning Foundation Model for Scalable and Interpretable Surgical Decision Support with Multi-Center Clinical Validation

関連論文リスト