Fugu-MT 論文翻訳(概要): A Multi-Pass Large Language Model Framework for Precise and Efficient Radiology Report Error Detection

論文の概要: A Multi-Pass Large Language Model Framework for Precise and Efficient Radiology Report Error Detection

arxiv url: http://arxiv.org/abs/2506.20112v1
Date: Wed, 25 Jun 2025 04:02:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-26 21:00:42.597189
Title: A Multi-Pass Large Language Model Framework for Precise and Efficient Radiology Report Error Detection
Title（参考訳）: 高精度かつ効率的な放射線診断報告誤り検出のためのマルチパス大言語モデルフレームワーク
Authors: Songsoo Kim, Seungtae Lee, See Young Lee, Joonho Kim, Keechan Kan, Dukyong Yoon,
Abstract要約: 大規模言語モデル (LLM) を用いた放射線診断用証明読解法では, 誤りの頻度が低いため, 正の予測値 (PPV) が制限される。 3パス LLM フレームワークは PPV を大幅に強化し、運用コストを削減した。
参考スコア（独自算出の注目度）: 1.8604092379196109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Background: The positive predictive value (PPV) of large language model (LLM)-based proofreading for radiology reports is limited due to the low error prevalence. Purpose: To assess whether a three-pass LLM framework enhances PPV and reduces operational costs compared with baseline approaches. Materials and Methods: A retrospective analysis was performed on 1,000 consecutive radiology reports (250 each: radiography, ultrasonography, CT, MRI) from the MIMIC-III database. Two external datasets (CheXpert and Open-i) were validation sets. Three LLM frameworks were tested: (1) single-prompt detector; (2) extractor plus detector; and (3) extractor, detector, and false-positive verifier. Precision was measured by PPV and absolute true positive rate (aTPR). Efficiency was calculated from model inference charges and reviewer remuneration. Statistical significance was tested using cluster bootstrap, exact McNemar tests, and Holm-Bonferroni correction. Results: Framework PPV increased from 0.063 (95% CI, 0.036-0.101, Framework 1) to 0.079 (0.049-0.118, Framework 2), and significantly to 0.159 (0.090-0.252, Framework 3; P<.001 vs. baselines). aTPR remained stable (0.012-0.014; P>=.84). Operational costs per 1,000 reports dropped to USD 5.58 (Framework 3) from USD 9.72 (Framework 1) and USD 6.85 (Framework 2), reflecting reductions of 42.6% and 18.5%, respectively. Human-reviewed reports decreased from 192 to 88. External validation supported Framework 3's superior PPV (CheXpert 0.133, Open-i 0.105) and stable aTPR (0.007). Conclusion: A three-pass LLM framework significantly enhanced PPV and reduced operational costs, maintaining detection performance, providing an effective strategy for AI-assisted radiology report quality assurance.
Abstract（参考訳）: 背景: 大規模言語モデル (LLM) を用いた放射線診断のための証明読解法は, 誤りの頻度が低いため, 正の予測値 (PPV) が制限される。目的: 3 パス LLM フレームワークが PPV を強化し, 運用コストをベースラインアプローチと比較して低減する。材料と方法: MIMIC-IIIデータベースから, 放射線検査, 超音波検査, CT, MRIの1,000回連続的報告(それぞれ250回)の振り返り分析を行った。 2つの外部データセット(CheXpertとOpen-i)が検証セットである。 1)単発検知器,(2)抽出器+検出器,(3)抽出器,検出器,偽陽性検証器の3つのLCMフレームワークが試験された。精度はPPVと絶対真正率(aTPR)で測定した。モデル推論電荷とレビュアー報酬から効率を計算した。統計的意義は、クラスタブートストラップ、正確なマクネマール試験、ホルム・ボンフェロニ補正を用いて検証された。結果: フレームワーク PPV は 0.063 (95% CI, 0.036-0.101, Framework) から増加した 0.159 (0.090-0.252, Framework 3; P<.001 vs. ベースライン)。 aTPRは安定であった(0.012-0.014; P>=.84)。 1000件あたりの運用コストが5.58ドル(フレームワーク)に低下 3) USD 9.72 (Framework) 1及びUSD 6.85(フレームワーク2)は、それぞれ42.6%と18.5%の減少を反映している。調査報告は192件から88件に減少した。外部検証は Framework 3 の優れた PPV (CheXpert 0.133, Open-i 0.105) と安定した aTPR (0.007) をサポートした。結論: 3 パス LLM フレームワークは PPV を大幅に向上し,運用コストを削減し,検出性能を向上し,AI 支援放射線学報告の品質保証に有効な戦略を提供する。

関連論文リスト

Evaluating Large Language Models for Zero-Shot Disease Labeling in CT Radiology Reports Across Organ Systems [1.1373722549440357]
胸部,腹部,骨盤CTの多自由度ラベル付けのためのルールベースアルゴリズム(RBA),RadBERT,および3つの軽量オープンウェイトLCMを比較した。コーエンのKappaとマイクロ/マクロ平均F1スコアを用いて評価した。
論文参考訳（メタデータ） (2025-06-03T18:00:08Z)
MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks [47.486705282473984]
大規模言語モデル(LLM)は、医学試験においてほぼ完璧なスコアを得る。これらの評価は、実際の臨床実践の複雑さと多様性を不十分に反映している。 MedHELMは,医療業務におけるLCMの性能を評価するための評価フレームワークである。
論文参考訳（メタデータ） (2025-05-26T22:55:49Z)
Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
本研究は、MIMIC-IVデータセットに基づく神経疾患患者を対象とした、ICUにおけるLOS予測のための複数のMLアプローチについて検討する。評価されたモデルには、古典的MLアルゴリズム(K-Nearest Neighbors、Random Forest、XGBoost、CatBoost)とニューラルネットワーク(LSTM、BERT、テンポラルフュージョントランス)が含まれる。
論文参考訳（メタデータ） (2025-05-23T14:06:42Z)
ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification [57.22053411719822]
ChestX-Reasoner(チェストX-Reasoner)は、臨床報告から直接採掘されるプロセスの監督を活用するために設計された放射線診断MLLMである。我々の2段階のトレーニングフレームワークは、モデル推論と臨床標準との整合性を高めるために、プロセス報酬によって指導された教師付き微調整と強化学習を組み合わせる。
論文参考訳（メタデータ） (2025-04-29T16:48:23Z)
ThyroidEffi 1.0: A Cost-Effective System for High-Performance Multi-Class Thyroid Carcinoma Classification [0.0]
甲状腺FNAB画像分類のための深層学習システムを開発した。 Benign, Indeterminate/Suspicious, and Malignantの3つの主要なカテゴリは、生後治療を直接指導するものだ。システムは1000ケースを30秒で処理し、広くアクセス可能なハードウェアの実現可能性を示した。
論文参考訳（メタデータ） (2025-04-19T02:13:07Z)
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks [49.0793012627959]
本稿では,価値に基づくパラダイム内での推論モデルに適した新しいフレームワークVAPOを提案する。 VAPOは最先端のスコアが$mathbf60.4$に達する。同じ実験条件下で直接比較すると、VAPOはDeepSeek-R1-Zero-Qwen-32BとDAPOの結果を10点以上上回っている。
論文参考訳（メタデータ） (2025-04-07T14:21:11Z)
Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters [16.74673750576054]
肺塞栓症登録は、研究の改善を加速するが、放射線医学報告の労働集約的手作業による抽象化に依存している。データ品質を損なうことなく,計算トモグラフィPE(CTPE)レポートから概念抽出を自動化できるかを検討した。 4つのラマ3型(3.0 8B, 3.1 8B, 3.1 70B, 3.3 70B)と1つのレビュアーモデルであるPhi 4 14Bは、MIMIC IVとデューク大学からそれぞれ250個の注釈付きCTPEレポートで試験された。正の予測値 (PPV) と負の予測値 (NPV) を人体金標準と比較し, 精度, 正の予測値 (PPV) を計測した。
論文参考訳（メタデータ） (2025-03-26T21:38:06Z)
Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model [1.7064514726335305]
クローン病患者のヘブライ語9,683例について検討した。我々は不確実性を認識したプロンプトアンサンブルとエージェントに基づく決定モデルを導入した。
論文参考訳（メタデータ） (2025-02-02T16:57:03Z)
Noisy probing dose facilitated dose prediction for pencil beam scanning proton therapy: physics enhances generalizability [18.852346492990637]
フォトンとプロトン療法におけるAIベースの線量予測研究は、基礎となる物理学を無視することが多い。本研究の目的は,物理認識と一般化可能なAIベースのPBSPT線量予測法を設計することである。
論文参考訳（メタデータ） (2023-12-02T00:15:44Z)
Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification [52.77024349608834]
視覚変換器(ViT)の胸部X線撮影(CXR)分類性能と注意ベース唾液の解釈可能性について検討する。 ViTは、CheXpert、Chest X-Ray 14、MIMIC CXR、VinBigDataの4つの公開データセットを用いて、肺疾患分類のために微調整された。 ViTsは最先端のCNNと比べてCXR分類AUCに匹敵するものであった。
論文参考訳（メタデータ） (2023-03-03T12:05:41Z)
Controlling False Positive/Negative Rates for Deep-Learning-Based Prostate Cancer Detection on Multiparametric MR images [58.85481248101611]
そこで本研究では,病変からスライスまでのマッピング機能に基づく,病変レベルのコスト感受性損失と付加的なスライスレベルの損失を組み込んだ新しいPCa検出ネットワークを提案する。 1) 病変レベルFNRを0.19から0.10に, 病変レベルFPRを1.03から0.66に減らした。
論文参考訳（メタデータ） (2021-06-04T09:51:27Z)
Deep Learning Based Detection and Localization of Intracranial Aneurysms in Computed Tomography Angiography [5.973882600944421]
初期動脈瘤検出のための3D領域提案ネットワークと偽陽性縮小のための3D DenseNetという2段階モデルが実装された。本モデルでは,0.25FPPV,最高F-1スコアと比較すると,統計的に高い精度,感度,特異性を示した。
論文参考訳（メタデータ） (2020-05-22T10:49:23Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。