Fugu-MT 論文翻訳(概要): Semantic Context-aware mOdality fUsion Transformer (SCOUT): A Context-Aware Multimodal Transformer for Concept-Grounded Pathology Report Generation

論文の概要: Semantic Context-aware mOdality fUsion Transformer (SCOUT): A Context-Aware Multimodal Transformer for Concept-Grounded Pathology Report Generation

arxiv url: http://arxiv.org/abs/2605.01144v1
Date: Fri, 01 May 2026 22:40:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.608853
Title: Semantic Context-aware mOdality fUsion Transformer (SCOUT): A Context-Aware Multimodal Transformer for Concept-Grounded Pathology Report Generation
Title（参考訳）: 意味的文脈認識型mOdality fUsion Transformer (SCOUT):概念包含型病態レポート作成のためのコンテキスト認識型マルチモーダルトランス
Authors: Suryakant Singh, Saarthak Kapse, Joel Saltz, Prateek Prasanna,
Abstract要約: SCOUT:semantic Context-aware mOdality fUsion Transformerは,病理報告生成のためのコンテキスト認識概念に基づくマルチモーダルフレームワークである。手法は、局所的な組織学的パターン、全体スライディングコンテキスト、専門家が作成したセマンティックディスクリプタを統一学習パラダイムに統合する。テキスト生成中に、奥行き認識のコンテキスト変調と適応的なマルチモーダル融合を組み合わせることで、臨床的に一貫性のあるレポートを生成する。
参考スコア（独自算出の注目度）: 6.938242893061667
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Whole-slide images (WSIs) present a fundamental challenge for computational pathology due to their extreme resolution, multi-scale heterogeneity, and the requirement for clinically reliable interpretation. Although recent pathology foundation models have enabled fluent report generation, they often lack clinical grounding, failing to accurately represent key diagnostic concepts and relationships observed by pathologists. This limitation arises from the difficulty of integrating heterogeneous visual evidence spanning fine-grained cellular patterns, slide-level tissue architecture, and high-level diagnostic concepts, while maintaining interpretability and clinical coherence. Here we present SCOUT: Semantic Context-aware mOdality fUsion Transformer, a context-aware concept-grounded multimodal framework for pathology report generation that enables progressive conditioning of image representations by global slide information and explicit diagnostic concepts. The method integrates local histological patterns, whole-slide context, and expert-curated semantic descriptors within a unified learning paradigm, allowing visual features to be dynamically refined throughout the encoding process. By combining depth-aware contextual modulation with adaptive multimodal fusion during text generation, the framework produces clinically coherent reports while preserving complementarity across representational scales. Using CONCH1.5 features, we evaluate SCOUT against WSI-Caption, HistGen, and BiGen on TCGA-BRCA, MICCAI REG, and HistAI. SCOUT achieves the best BLEU-1 to BLEU-4 and METEOR scores on all datasets, plus the best ROUGE-L on TCGA-BRCA and MICCAI REG. On TCGA-BRCA, it reaches 0.436/0.303/0.202/0.156 BLEU-1/2/3/4 and 0.204 METEOR; on REG 2025, it achieves 0.865/0.834/0.805/0.780 and 0.568. These results support progressive contextual conditioning for grounded pathology report generation.
Abstract（参考訳）: ホイルスライディング画像(WSI)は、その極度解像度、多スケールの不均一性、臨床的に信頼性の高い解釈の必要性から、計算病理学の基本的な課題を提示する。最近の病理基盤モデルでは、流動的なレポート生成が可能になっているが、しばしば臨床基盤が欠如しており、病理学者が観察する重要な診断概念と関係を正確に表現することができない。この制限は、微粒な細胞パターン、スライドレベルの組織構造、高レベルの診断概念にまたがる不均一な視覚的証拠を統合することの難しさから生じる。 SCOUT: Semantic Context-aware mOdality fUsion Transformerは,大域的なスライド情報と明示的な診断概念による画像表現のプログレッシブな条件付けを可能にする,コンテキスト対応のコンセプト・グラウンド・グラウンド・マルチモーダル・フレームワークである。この手法は、局所的なヒストロジカルパターン、全スライディングコンテキスト、専門家が作成したセマンティックディスクリプタを統合学習パラダイムに統合し、エンコーディングプロセスを通して視覚的特徴を動的に洗練することができる。テキスト生成中に、奥行き認識のコンテキスト変調と適応的なマルチモーダル融合を組み合わせることで、表現尺度間の相補性を保ちながら、臨床的にコヒーレントなレポートを生成する。 SCOUT を WSI-Caption, HistGen, BiGen に対して TCGA-BRCA, MICCAI REG, HistAI で評価する。 SCOUTはすべてのデータセットでBLEU-1からBLEU-4、METEORのスコア、TCGA-BRCAとMICCAI REGで最高のROUGE-Lを達成している。 TCGA-BRCAでは0.436/0.303/0.202/0.156 BLEU-1/2/3/4と0.204 METEORに到達し、REG 2025では0.865/0.834/0.805/0.780と0.568に達する。これらの結果は,病状報告生成のための進行的文脈条件付けを支援する。

論文の概要: Semantic Context-aware mOdality fUsion Transformer (SCOUT): A Context-Aware Multimodal Transformer for Concept-Grounded Pathology Report Generation

関連論文リスト