Fugu-MT 論文翻訳(概要): RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores

論文の概要: RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores

arxiv url: http://arxiv.org/abs/2508.15464v1
Date: Thu, 21 Aug 2025 11:34:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-22 16:26:46.297789
Title: RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores
Title（参考訳）: RadReason:RadReasonの評価基準とサブスコア
Authors: Yingshu Li, Yunyi Liu, Lingqiao Liu, Lei Wang, Luping Zhou,
Abstract要約: 放射線学報告のための新しい評価フレームワークRadReasonを紹介する。 6つの臨床的に定義されたエラータイプにまたがって、きめ細かいサブスコアを出力する。また、スコアの背景にある根拠を説明する人間可読な正当性も生み出す。
参考スコア（独自算出の注目度）: 37.16761198532088
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Evaluating automatically generated radiology reports remains a fundamental challenge due to the lack of clinically grounded, interpretable, and fine-grained metrics. Existing methods either produce coarse overall scores or rely on opaque black-box models, limiting their usefulness in real-world clinical workflows. We introduce RadReason, a novel evaluation framework for radiology reports that not only outputs fine-grained sub-scores across six clinically defined error types, but also produces human-readable justifications that explain the rationale behind each score. Our method builds on Group Relative Policy Optimization and incorporates two key innovations: (1) Sub-score Dynamic Weighting, which adaptively prioritizes clinically challenging error types based on live F1 statistics; and (2) Majority-Guided Advantage Scaling, which adjusts policy gradient updates based on prompt difficulty derived from sub-score agreement. Together, these components enable more stable optimization and better alignment with expert clinical judgment. Experiments on the ReXVal benchmark show that RadReason surpasses all prior offline metrics and achieves parity with GPT-4-based evaluations, while remaining explainable, cost-efficient, and suitable for clinical deployment. Code will be released upon publication.
Abstract（参考訳）: 自動的に生成された放射線学の報告を評価することは、臨床に根ざした、解釈可能な、きめ細かいメトリクスが欠如していることから、依然として根本的な課題である。既存の方法は、粗いスコアを生成するか、不透明なブラックボックスモデルに依存するかのいずれかであり、実際の臨床ワークフローにおける有用性を制限している。 RadReasonは、6つの臨床的に定義されたエラータイプにまたがる詳細なサブスコアを出力するだけでなく、各スコアの背景にある理論的根拠を説明する人間可読な正当性も生成する、放射線学レポートのための新しい評価フレームワークである。本手法は,グループ相対的政策最適化を基盤として,(1)F1統計に基づく臨床上の難易度の高いエラータイプを適応的に優先順位付けするサブスコア動的重み付け,(2)サブスコア合意に基づく早期の難易度に基づく政策勾配更新を調整するMajority-Guided Advantage Scalingという2つの重要なイノベーションを取り入れている。これらのコンポーネントは、より安定した最適化と、専門家の臨床的判断との整合性を高める。 ReXValベンチマークの実験では、RadReasonは以前のオフラインメトリクスを全て上回り、GPT-4ベースの評価と同等でありながら、説明可能でコスト効率が高く、臨床展開に適していることが示されている。コードは出版時に公開される。

論文の概要: RadReason: Radiology Report Evaluation Metric with Reasons and Sub-Scores

関連論文リスト