Fugu-MT 論文翻訳(概要): CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation

論文の概要: CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation

arxiv url: http://arxiv.org/abs/2604.10410v2
Date: Wed, 15 Apr 2026 21:33:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 16:09:14.143289
Title: CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation
Title（参考訳）: CWCD:構造化医療レポート生成のためのカテゴリワイズコントラストデコーディング
Authors: Shantam Srivastava, Mahesh Bhosale, David Doermann, Mingchen Gao,
Abstract要約: Category-Wise Contrastive Decoding (CWCD)は、構造化ラジオロジーレポート生成(SRRG)を強化するために設計された、新しくモジュール化されたフレームワークである。 CWCDは、臨床効果と自然言語生成の指標の両方において、ベースライン法を一貫して上回っている。
参考スコア（独自算出の注目度）: 5.6071155906499115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interpreting chest X-rays is inherently challenging due to the overlap between anatomical structures and the subtle presentation of many clinically significant pathologies, making accurate diagnosis time-consuming even for experienced radiologists. Recent radiology-focused foundation models, such as LLaVA-Rad and Maira-2, have positioned multi-modal large language models (MLLMs) at the forefront of automated radiology report generation (RRG). However, despite these advances, current foundation models generate reports in a single forward pass. This decoding strategy diminishes attention to visual tokens and increases reliance on language priors as generation proceeds, which in turn introduces spurious pathology co-occurrences in the generated reports. To mitigate these limitations, we propose Category-Wise Contrastive Decoding (CWCD), a novel and modular framework designed to enhance structured radiology report generation (SRRG). Our approach introduces category-specific parameterization and generates category-wise reports by contrasting normal X-rays with masked X-rays using category-specific visual prompts. Experimental results demonstrate that CWCD consistently outperforms baseline methods across both clinical efficacy and natural language generation metrics. An ablation study further elucidates the contribution of each architectural component to overall performance.
Abstract（参考訳）: 胸部X線を解釈することは、解剖学的構造と多くの臨床的に重要な病態の微妙な提示の重複により本質的に困難であり、経験豊富な放射線医でも正確な診断に時間がかかる。 LLaVA-RadやMaira-2のような近年の放射線学に焦点を当てた基礎モデルは、自動放射線学レポート生成(RRG)の最前線にマルチモーダル大言語モデル(MLLM)を置いている。しかし、これらの進歩にもかかわらず、現在の基礎モデルは単一の前方通過でレポートを生成する。このデコード戦略は、生成が進むにつれて、視覚的トークンへの注意を減らし、言語先行への依存を高める。これらの制約を緩和するため,我々は,構造化ラジオロジーレポート生成(SRRG)の高度化を目的とした,新しいモジュラー・フレームワークであるCWCD(Calegory-Wise Contrastive Decoding)を提案する。提案手法は, カテゴリー別パラメータ化を導入し, カテゴリー別ビジュアルプロンプトを用いて, 通常のX線とマスク付きX線を対比してカテゴリ別レポートを生成する。実験結果から,CWCDは臨床効果と自然言語生成の指標の両面で,ベースライン法を一貫して上回っていることが示された。アブレーション調査により、全体的なパフォーマンスに対する各アーキテクチャコンポーネントの貢献がさらに解明される。

論文の概要: CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation

関連論文リスト