Fugu-MT 論文翻訳(概要): TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis

論文の概要: TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis

arxiv url: http://arxiv.org/abs/2603.05867v2
Date: Mon, 09 Mar 2026 11:51:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:12.500821
Title: TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
Title（参考訳）: 悪性腫瘍診断のためのインターリーブ型マルチモーダルチェイン・オブ・サート推論
Authors: Sijing Li, Zhongwei Qiu, Jiang Liu, Wenqiao Zhang, Tianwei Lin, Yihan Xie, Jianxiang An, Boxiang Yun, Chenglin Yang, Jun Xiao, Guangyu Guo, Jiawen Yao, Wei Liu, Yuan Gao, Ke Yan, Weiwei Cao, Zhilin Zheng, Tony C. W. Mok, Kai Cao, Yu Shi, Jiuyu Zhang, Jian Zhou, Beng Chin Ooi, Yingda Xia, Ling Zhang,
Abstract要約: tumorChainは、3Dイメージングエンコーダ、臨床テキスト理解、臓器レベルの視覚言語アライメントを密結合するマルチモーダルインターリーブ推論フレームワークである。実験では、病変検出、印象生成、病理分類において、強いベースラインよりも一貫した改善が見られ、DeepTumorVQAベンチマークで強い一般化が示されている。
参考スコア（独自算出の注目度）: 46.04720262017957
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate tumor analysis is central to clinical radiology and precision oncology, where early detection, reliable lesion characterization, and pathology-level risk assessment guide diagnosis and treatment planning. Chain-of-Thought (CoT) reasoning is particularly important in this setting because it enables step-by-step interpretation from imaging findings to clinical impressions and pathology conclusions, improving traceability and reducing diagnostic errors. Here, we target the clinical tumor analysis task and build a large-scale benchmark that operationalizes a multimodal reasoning pipeline, spanning findings, impressions, and pathology predictions. We curate TumorCoT, a large-scale dataset of 1.5M CoT-labeled VQA instructions paired with 3D CT scans, with step-aligned rationales and cross-modal alignments along the trajectory from findings to impression to pathology, enabling evaluation of both answer accuracy and reasoning consistency. We further propose TumorChain, a multimodal interleaved reasoning framework that tightly couples 3D imaging encoders, clinical text understanding, and organ-level vision-language alignment. Through cross-modal alignment and iterative interleaved causal reasoning, TumorChain grounds visual evidence, aggregates conclusions, and issues pathology predictions after multiple rounds of self-refinement, improving traceability and reducing hallucination risk. Experiments show consistent improvements over strong baselines in lesion detection, impression generation, and pathology classification, and demonstrate strong generalization on the DeepTumorVQA benchmark. These results highlight the potential of multimodal reasoning for reliable and interpretable tumor analysis in clinical practice. Detailed information about our project can be found on our project homepage at https://github.com/ZJU4HealthCare/TumorChain.
Abstract（参考訳）: 腫瘍の正確な解析は、早期発見、信頼できる病変のキャラクタリゼーション、病理レベルリスクアセスメントガイドの診断と治療計画など、臨床放射線学および精密腫瘍学の中心である。 CoT(Chain-of-Thought)推論は、画像所見から臨床印象や病理学的結論まで段階的に解釈し、トレーサビリティを改善し、診断誤差を低減できるため、この設定において特に重要である。そこで本研究では, 臨床腫瘍解析タスクを目標とし, マルチモーダル推論パイプラインを運用する大規模ベンチマークを構築した。我々は,1.5M CoT-labeled VQA命令を3次元CTスキャンと組み合わせた大規模データセットであるTormaCoTを解析した。さらに,3次元画像エンコーダ,臨床テキスト理解,臓器レベルの視覚言語アライメントを密結合するマルチモーダルインターリーブ・推論フレームワークであるTurmaChainを提案する。クロスモーダルアライメントと反復的インターリーブによる因果推論を通じて、TormaChainは視覚的エビデンスを根拠とし、結論を集約し、複数ラウンドの自己切除後に病理予測を発行し、トレーサビリティを改善し、幻覚リスクを低減させる。実験では、病変検出、印象生成、病理分類において、強いベースラインよりも一貫した改善が見られ、DeepTumorVQAベンチマークで強い一般化が示されている。これらの結果は, 臨床における腫瘍解析の信頼性と解釈性に関するマルチモーダル推論の可能性を強調した。プロジェクトの詳細情報はプロジェクトのホームページhttps://github.com/ZJU4HealthCare/TumorChain.comで確認できます。

論文の概要: TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis

関連論文リスト