Fugu-MT 論文翻訳(概要): Calibrated Confidence Expression for Radiology Report Generation

論文の概要: Calibrated Confidence Expression for Radiology Report Generation

arxiv url: http://arxiv.org/abs/2603.29492v1
Date: Tue, 31 Mar 2026 09:37:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-01 15:25:03.45345
Title: Calibrated Confidence Expression for Radiology Report Generation
Title（参考訳）: 放射線学レポート生成のための校正信頼表現
Authors: David Bani-Harouni, Chantal Pellegrini, Julian Lüers, Su Hwan Kim, Markus Baalmann, Benedikt Wiestler, Rickmer Braren, Nassir Navab, Matthias Keicher,
Abstract要約: 放射線学レポート生成における大規模視覚言語モデル(LVLM)は、正確な予測と臨床的に解釈可能な指標を必要とする。現在の最先端言語モデルはしばしば自信過剰であり、放射線学レポート生成などのマルチモーダル設定における校正に関する研究は限られている。本稿では,LVLMを微調整する医療強化学習フレームワークであるConRadを紹介する。
参考スコア（独自算出の注目度）: 33.24673060327421
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Safe deployment of Large Vision-Language Models (LVLMs) in radiology report generation requires not only accurate predictions but also clinically interpretable indicators of when outputs should be thoroughly reviewed, enabling selective radiologist verification and reducing the risk of hallucinated findings influencing clinical decisions. One intuitive approach to this is verbalized confidence, where the model explicitly states its certainty. However, current state-of-the-art language models are often overconfident, and research on calibration in multimodal settings such as radiology report generation is limited. To address this gap, we introduce ConRad (Confidence Calibration for Radiology Reports), a reinforcement learning framework for fine-tuning medical LVLMs to produce calibrated verbalized confidence estimates alongside radiology reports. We study two settings: a single report-level confidence score and a sentence-level variant assigning a confidence to each claim. Both are trained using the GRPO algorithm with reward functions based on the logarithmic scoring rule, which incentivizes truthful self-assessment by penalizing miscalibration and guarantees optimal calibration under reward maximization. Experimentally, ConRad substantially improves calibration and outperforms competing methods. In a clinical evaluation we show that ConRad's report level scores are well aligned with clinicians' judgment. By highlighting full reports or low-confidence statements for targeted review, ConRad can support safer clinical integration of AI-assistance for report generation.
Abstract（参考訳）: 放射線学報告生成におけるLVLM(Large Vision-Language Models)の安全な展開には、正確な予測だけでなく、いつアウトプットが徹底的にレビューされるべきかを臨床的に解釈可能な指標が必要である。これに対する直感的なアプローチの1つは、モデルがその確実性を明確に記述する、言語化された信頼である。しかし、現状の言語モデルはしばしば自信過剰であり、放射線学レポート生成のようなマルチモーダル設定における校正に関する研究は限られている。このギャップに対処するために,医療用LVLMを微調整するための強化学習フレームワークであるConRad(Confidence Calibration for Radiology Reports)を紹介した。 1つのレポートレベルの信頼度スコアと、各クレームに信頼を割り当てる文レベルの変量という2つの設定について検討する。どちらも対数スコアリングルールに基づく報奨関数を持つGRPOアルゴリズムを用いてトレーニングされており、これは誤校正を罰し、報酬最大化の下で最適な校正を保証することによって真正自己評価のインセンティブを与える。実験的に、ConRadはキャリブレーションを大幅に改善し、競合する手法より優れている。臨床評価では,ConRadの報告レベルスコアは臨床医の判断とよく一致している。対象レビューのための完全なレポートや低信頼のステートメントを強調することで、ConRadはレポート生成のためのAIアシストのより安全な臨床統合をサポートすることができる。

論文の概要: Calibrated Confidence Expression for Radiology Report Generation

関連論文リスト