Fugu-MT 論文翻訳(概要): Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

論文の概要: Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

arxiv url: http://arxiv.org/abs/2603.16822v1
Date: Tue, 17 Mar 2026 17:27:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.452188
Title: Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence
Title（参考訳）: Surg$$:手術知能のための大規模マルチモーダルデータと基礎モデルのスペクトル
Authors: Zhitao Zeng, Mengya Xu, Jian Jiang, Pengfei Guo, Yunqiu Xu, Zhu Zhuo, Chang Han Low, Yufan He, Dong Yang, Chenxi Lin, Yiming Gu, Jiaxin Guo, Yutong Ban, Daguang Xu, Qi Dou, Yueming Jin,
Abstract要約: 手術情報のための大規模マルチモーダルデータと基礎モデルのスペクトルであるSurg$を紹介した。このフレームワークのコアとなるSurg$-DBは、多様な外科的タスクをサポートするように設計された大規模マルチモーダルデータ基盤である。我々は最近開発されたSurg$-DBに基づく外科的基礎モデルを通して経験的証拠を提供する。
参考スコア（独自算出の注目度）: 40.457040350909004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Surgical intelligence has the potential to improve the safety and consistency of surgical care, yet most existing surgical AI frameworks remain task-specific and struggle to generalize across procedures and institutions. Although multimodal foundation models, particularly multimodal large language models, have demonstrated strong cross-task capabilities across various medical domains, their advancement in surgery remains constrained by the lack of large-scale, systematically curated multimodal data. To address this challenge, we introduce Surg$Σ$, a spectrum of large-scale multimodal data and foundation models for surgical intelligence. At the core of this framework lies Surg$Σ$-DB, a large-scale multimodal data foundation designed to support diverse surgical tasks. Surg$Σ$-DB consolidates heterogeneous surgical data sources (including open-source datasets, curated in-house clinical collections and web-source data) into a unified schema, aiming to improve label consistency and data standardization across heterogeneous datasets. Surg$Σ$-DB spans 6 clinical specialties and diverse surgical types, providing rich image- and video-level annotations across 18 practical surgical tasks covering understanding, reasoning, planning, and generation, at an unprecedented scale (over 5.98M conversations). Beyond conventional multimodal conversations, Surg$Σ$-DB incorporates hierarchical reasoning annotations, providing richer semantic cues to support deeper contextual understanding in complex surgical scenarios. We further provide empirical evidence through recently developed surgical foundation models built upon Surg$Σ$-DB, illustrating the practical benefits of large-scale multimodal annotations, unified semantic design, and structured reasoning annotations for improving cross-task generalization and interpretability.
Abstract（参考訳）: 外科的知能は外科的ケアの安全性と整合性を改善する可能性があるが、既存の外科的AIフレームワークの多くはタスク固有であり、手順や機関をまたいだ一般化に苦慮している。マルチモーダル基礎モデル(特に多モーダルな言語モデル)は、様々な医療領域にまたがる強力なクロスタスク能力を示しているが、大規模で体系的な多モーダルデータの欠如により、手術の進行は制限されている。この課題に対処するため,手術情報のための大規模マルチモーダルデータと基礎モデルであるSurg$$を紹介した。このフレームワークの中核は、多様な外科的タスクをサポートするように設計された大規模マルチモーダルデータ基盤であるSurg$$-DBである。 Surg$$-DBは異種外科的データソース(オープンソースデータセット、社内臨床コレクション、Webソースデータを含む)を統一スキーマに統合し、異種データセット間のラベル一貫性とデータの標準化を改善することを目的としている。 Surg$$$-DBは、6つの臨床専門分野と多様な外科的タイプにまたがっており、理解、推論、計画、生成を網羅する18の実用的な外科的タスク(5.98万以上の会話)に、画像およびビデオレベルのアノテーションを提供する。従来のマルチモーダルな会話以外にも、Surg$$-DBには階層的推論アノテーションが組み込まれており、複雑な手術シナリオにおけるより深いコンテキスト理解を支援するためのよりリッチなセマンティックな手がかりを提供する。また,最近開発されたSurg$$-DBに基づく外科的基礎モデルを用いて,大規模マルチモーダルアノテーション,統一セマンティックデザイン,マルチタスクの一般化と解釈性向上のための構造化推論アノテーションの実用的メリットを考察した。

論文の概要: Surg$Σ$: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

関連論文リスト