Fugu-MT 論文翻訳(概要): Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients

論文の概要: Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients

arxiv url: http://arxiv.org/abs/2505.20609v1
Date: Tue, 27 May 2025 01:15:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-28 17:05:58.331006
Title: Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients
Title（参考訳）: 模擬患者を用いた内科領域における大規模言語モデルに基づくリアルタイム医用AIインタフェースと医師の比較
Authors: Hyungjun Park, Chang-Yun Woo, Seungjo Lim, Seunghwan Lim, Keunho Kwak, Ju Young Jeong, Chong Hyun Suh,
Abstract要約: LLMをベースとしたリアルタイム複合診断医療AIインタフェースを開発した。内科領域の一般患者を対象に,このインターフェースと医師を比較検討した。第1および第2微分診断の精度は、医師の70%から90%までであったが、AIインターフェースは100%の精度を達成した。
参考スコア（独自算出の注目度）: 1.0679692136113117
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Objective To develop an LLM based realtime compound diagnostic medical AI interface and performed a clinical trial comparing this interface and physicians for common internal medicine cases based on the United States Medical License Exam (USMLE) Step 2 Clinical Skill (CS) style exams. Methods A nonrandomized clinical trial was conducted on August 20, 2024. We recruited one general physician, two internal medicine residents (2nd and 3rd year), and five simulated patients. The clinical vignettes were adapted from the USMLE Step 2 CS style exams. We developed 10 representative internal medicine cases based on actual patients and included information available on initial diagnostic evaluation. Primary outcome was the accuracy of the first differential diagnosis. Repeatability was evaluated based on the proportion of agreement. Results The accuracy of the physicians' first differential diagnosis ranged from 50% to 70%, whereas the realtime compound diagnostic medical AI interface achieved an accuracy of 80%. The proportion of agreement for the first differential diagnosis was 0.7. The accuracy of the first and second differential diagnoses ranged from 70% to 90% for physicians, whereas the AI interface achieved an accuracy rate of 100%. The average time for the AI interface (557 sec) was 44.6% shorter than that of the physicians (1006 sec). The AI interface ($0.08) also reduced costs by 98.1% compared to the physicians' average ($4.2). Patient satisfaction scores ranged from 4.2 to 4.3 for care by physicians and were 3.9 for the AI interface Conclusion An LLM based realtime compound diagnostic medical AI interface demonstrated diagnostic accuracy and patient satisfaction comparable to those of a physician, while requiring less time and lower costs. These findings suggest that AI interfaces may have the potential to assist primary care consultations for common internal medicine cases.
Abstract（参考訳）: 目的) LLMをベースとしたリアルタイム複合診断医療AIインターフェースを開発し,USMLE (USMLE) Step 2 Clinical Skill (CS)スタイルの試験に基づいて,このインターフェースと一般内科の医師を比較検討する臨床試験を行った。方法 2024年8月20日に非ランダム化臨床試験を行った。患者は一般医師1名,内科医2名(2年, 3年), 模擬患者5名であった。臨床検査はUSMLE Step 2 CSスタイルで行った。当科では, 内科の患者10名を対象に, 初診時の診断情報を含む内科的症例10名について検討した。初回診断は初回診断の精度であった。再現性は合意の割合に基づいて評価された。結果: 医師の最初の鑑別診断精度は50%から70%, リアルタイム複合診断AIインタフェースは80%であった。初診時の一致率は0.7。第1および第2微分診断の精度は、医師の70%から90%までであったが、AIインターフェースは100%の精度を達成した。 AIインターフェースの平均時間は(557 sec)、医師(1006 sec)よりも44.6%短い。 AIインターフェース(0.08ドル)は、医師の平均(4.2)に比べてコストも98.1%削減された。患者の満足度スコアは、医師によるケアの4.2から4.3までで、AIインターフェースの3.9であった。これらの結果は、AIインタフェースは、一般的な内科患者のプライマリ・ケア・コンサルテーションを支援する可能性があることを示唆している。

関連論文リスト

AI-assisted workflow enables rapid, high-fidelity breast cancer clinical trial eligibility prescreening [4.008304844602351]
臨床テキストから自動検査を行うAIシステムMSK-MATCH(Memorial Sloan Kettering Multi-Agent Trial Coordination Hub)を開発した。 MSK-MATCHは、大規模な言語モデルと、キュレートされたオンコロジートライアル知識ベースと検索強化アーキテクチャを統合している。 MSK-MATCHは、6回の乳癌治験で731人の患者から88,518人の臨床記録を振り返って分析し、61.9%の患者を自動で解決し、38.1%の人的レビューを行った。
論文参考訳（メタデータ） (2025-11-07T20:27:05Z)
Evolving Diagnostic Agents in a Virtual Clinical Environment [75.59389103511559]
本稿では,大規模言語モデル(LLM)を強化学習を用いた診断エージェントとして訓練するためのフレームワークを提案する。本手法は対話型探索と結果に基づくフィードバックによって診断戦略を取得する。 DiagAgentはDeepSeek-v3やGPT-4oなど、最先端の10のLLMを著しく上回っている。
論文参考訳（メタデータ） (2025-10-28T17:19:47Z)
Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
大規模言語モデル(LLM)におけるシーケンシャルな臨床推論を評価するためのベンチマークであるVivaBenchを紹介する。本データセットは,医療訓練における(口頭)検査をシミュレートする対話的シナリオとして構成された1762名の医師による臨床ヴィグネットから構成される。本分析では,臨床における認知的誤りを反映するいくつかの障害モードを同定した。
論文参考訳（メタデータ） (2025-10-11T16:24:35Z)
Reverse Physician-AI Relationship: Full-process Clinical Diagnosis Driven by a Large Language Model [71.40113970879219]
医師とAIの関係を逆転させるパラダイムシフトを提案する。 DxDirector-7Bは先進的な深層思考能力を持つLLMであり,医師の関与を最小限に抑えたフルプロセス診断を可能にする。 DxDirector-7Bは診断精度が優れているだけでなく、医師の作業量を大幅に削減する。
論文参考訳（メタデータ） (2025-08-14T09:51:20Z)
Toward the Autonomous AI Doctor: Quantitative Benchmarking of an Autonomous Agentic AI Versus Board-Certified Clinicians in a Real World Setting [0.0]
2030年までには、世界中で1100万人の医療従事者が不足していると予測されている。エンド・ツー・エンドの大規模言語モデル(LLM)ベースのAIシステムは、実際の臨床実践において厳格に評価されていない。
論文参考訳（メタデータ） (2025-06-27T19:04:44Z)
Sequential Diagnosis with Language Models [21.22416732642907]
本稿では,304症例を段階的に診断するシークエンシャル診断ベンチマークを紹介する。成績は、診断精度だけでなく、医師の診察や検査の費用によって評価される。また,医師のパネルを模擬したモデル診断オーケストレータであるMAI診断オーケストレータ(MAI-DxO)についても紹介する。
論文参考訳（メタデータ） (2025-06-27T17:27:26Z)
An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [58.78045864541539]
大型言語モデル(LLM)を用いた最初のまれな疾患診断エージェントシステムであるDeepRareを紹介する。 DeepRareは、まれな疾患の診断仮説を分類し、それぞれに透明な推論の連鎖が伴う。このシステムは2,919の疾患に対して異常な診断性能を示し、1013の疾患に対して100%の精度を達成している。
論文参考訳（メタデータ） (2025-06-25T13:42:26Z)
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports [49.00805568780791]
MedCaseReasoningはLarge Language Models(LLM)を評価するための最初のオープンアクセスデータセットである。データセットには14,489の診断的質問・回答ケースが含まれており、それぞれに詳細な推論文がペアリングされている。我々は,MedCaseReasoning上での最先端の推論LPMを評価し,診断と推論に重大な欠点を見出した。
論文参考訳（メタデータ） (2025-05-16T22:34:36Z)
A Scalable Approach to Benchmarking the In-Conversation Differential Diagnostic Accuracy of a Health AI [0.0]
本研究では、健康AIシステムを評価するためのスケーラブルなベンチマーク手法を提案する。提案手法では,14の専門分野に400の検証済み臨床ヴィグネットを用いて,現実的な臨床行為をシミュレートするためにAIを利用した患者アクターを用いた。 8月は81.8%(327/400件)の診断精度、85.0%(340/400件)のトップ2の診断精度を達成し、従来の症状チェッカーを上回った。
論文参考訳（メタデータ） (2024-12-17T05:02:33Z)
A Deep Learning System for Rapid and Accurate Warning of Acute Aortic Syndrome on Non-contrast CT in China [23.161834941227337]
大動脈CT血管造影(ATA)は,AASを疑う患者に選択される画像プロトコルである。中国における経済とワークフローの制約により、疑いのある患者の大多数が最初の画像検査として非コントラストCTを施行した。 AAS識別に非コントラストCTを用いた人工知能による警告システムiAortaを提案する。
論文参考訳（メタデータ） (2024-06-14T02:15:09Z)
Detection of subclinical atherosclerosis by image-based deep learning on chest x-ray [86.38767955626179]
460胸部X線で冠状動脈カルシウム(CAC)スコアを予測する深層学習アルゴリズムを開発した。 AICACモデルの診断精度は, 曲線下領域(AUC)で評価された。
論文参考訳（メタデータ） (2024-03-27T16:56:14Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
我々は,emphDoctorをプレイヤとして,NPC間の動的医療相互作用をシミュレーションするフレームワークであるtextbfAI Hospitalを紹介した。この設定は臨床シナリオにおけるLCMの現実的な評価を可能にする。高品質な中国の医療記録とNPCを利用したマルチビュー医療評価ベンチマークを開発した。
論文参考訳（メタデータ） (2024-02-15T06:46:48Z)
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias [5.421033429862095]
臨床的意思決定における認知的バイアスは、診断の誤りや患者下結果に大きく寄与する。本研究では,多エージェントフレームワークの利用を通じて,これらのバイアスを軽減するために,大規模言語モデルが果たす役割について検討する。
論文参考訳（メタデータ） (2024-01-26T01:35:50Z)
A deep learning pipeline for localization, differentiation, and uncertainty estimation of liver lesions using multi-phasic and multi-sequence MRI [15.078841623264543]
肝病変評価のための完全自動コンピュータ支援診断(CAD)ソリューションを提案する。肝切除または生検を施行し,肝癌(HCC),肝内胆管癌,二次転移と診断された400例を経験した。キースライス解析を用いて3次元MRI画像から病変を局所化し,その診断に信頼性を提供する完全自動深部CADパイプラインを提案する。
論文参考訳（メタデータ） (2021-10-17T13:19:00Z)
COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching [70.08786840301435]
本稿では, CrOss-Modal PseudO-SiamEse Network (COMPOSE) を提案する。実験の結果,患者基準マッチングでは98.0%,患者基準マッチングでは83.7%の精度でAUCに到達できることがわかった。
論文参考訳（メタデータ） (2020-06-15T21:01:33Z)
Joint Prediction and Time Estimation of COVID-19 Developing Severe Symptoms using Chest CT Scan [49.209225484926634]
術後に重篤な症状を発症するかどうかを判定するための共同分類法と回帰法を提案する。提案手法は,各試料の重量を考慮し,外乱の影響を低減し,不均衡な分類の問題を検討する。提案手法では, 重症症例の予測精度76.97%, 相関係数0.524, 変換時間0.55日差が得られた。
論文参考訳（メタデータ） (2020-05-07T12:16:37Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。