Fugu-MT 論文翻訳(概要): Simulating Clinical AI Assistance using Multimodal LLMs: A Case Study in Diabetic Retinopathy

論文の概要: Simulating Clinical AI Assistance using Multimodal LLMs: A Case Study in Diabetic Retinopathy

arxiv url: http://arxiv.org/abs/2509.13234v1
Date: Tue, 16 Sep 2025 16:42:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-17 17:50:53.178311
Title: Simulating Clinical AI Assistance using Multimodal LLMs: A Case Study in Diabetic Retinopathy
Title（参考訳）: マルチモーダルLDMを用いた臨床AI支援のシミュレーション : 糖尿病網膜症を事例として
Authors: Nadim Barakat, William Lotter,
Abstract要約: 糖尿病網膜症(DR)は、世界中の視覚障害の主要な原因であり、AIシステムは、基礎的な写真スクリーニングへのアクセスを拡大することができる。我々は、DRのための大規模言語モデル(MLLM)と、異なる出力タイプで臨床AI支援をシミュレートする能力を評価した。これらの結果から、MLLMはDRスクリーニングパイプラインを改善し、様々な出力構成で臨床AIアシストを研究するためのスケーラブルなシミュレータとして機能する可能性が示唆された。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diabetic retinopathy (DR) is a leading cause of blindness worldwide, and AI systems can expand access to fundus photography screening. Current FDA-cleared systems primarily provide binary referral outputs, where this minimal output may limit clinical trust and utility. Yet, determining the most effective output format to enhance clinician-AI performance is an empirical challenge that is difficult to assess at scale. We evaluated multimodal large language models (MLLMs) for DR detection and their ability to simulate clinical AI assistance across different output types. Two models were tested on IDRiD and Messidor-2: GPT-4o, a general-purpose MLLM, and MedGemma, an open-source medical model. Experiments included: (1) baseline evaluation, (2) simulated AI assistance with synthetic predictions, and (3) actual AI-to-AI collaboration where GPT-4o incorporated MedGemma outputs. MedGemma outperformed GPT-4o at baseline, achieving higher sensitivity and AUROC, while GPT-4o showed near-perfect specificity but low sensitivity. Both models adjusted predictions based on simulated AI inputs, but GPT-4o's performance collapsed with incorrect ones, whereas MedGemma remained more stable. In actual collaboration, GPT-4o achieved strong results when guided by MedGemma's descriptive outputs, even without direct image access (AUROC up to 0.96). These findings suggest MLLMs may improve DR screening pipelines and serve as scalable simulators for studying clinical AI assistance across varying output configurations. Open, lightweight models such as MedGemma may be especially valuable in low-resource settings, while descriptive outputs could enhance explainability and clinician trust in clinical workflows.
Abstract（参考訳）: 糖尿病網膜症(DR)は、世界中の視覚障害の主要な原因であり、AIシステムは、基礎的な写真スクリーニングへのアクセスを拡大することができる。現在のFDAクリーニングシステムは、主にバイナリ参照出力を提供しており、この最小限の出力は臨床信頼と実用性を制限する可能性がある。しかし,臨床とAIのパフォーマンスを高めるために最も効果的な出力形式を決定することは,大規模な評価が難しい経験的課題である。 DR検出のためのマルチモーダル大言語モデル (MLLM) と, 様々な出力タイプで臨床AI支援をシミュレートする能力について検討した。 IDRiDとMessidor-2では、汎用MLLMであるGPT-4oと、オープンソース医療モデルであるMedGemmaの2つのモデルが試験された。実験では,(1)ベースライン評価,(2)合成予測を用いたAI支援,(3)GPT-4oがMedGemma出力を組み込んだAIとAIの協調実験を行った。 MedGemmaはGPT-4oよりも高い感度, AUROC, GPT-4oは高い感度を示したが, GPT-4oは高い感度を示した。どちらのモデルもAI入力のシミュレーションに基づいて予測を調整したが、GPT-4oのパフォーマンスは誤りで崩壊し、MedGemmaはより安定していた。実際のコラボレーションにおいて、GPT-4oは、直接画像アクセス(AUROC 最大 0.96 まで)がなくても、MedGemma の記述出力によってガイドされた時に、強力な結果を得た。これらの結果から、MLLMはDRスクリーニングパイプラインを改善し、様々な出力構成で臨床AIアシストを研究するためのスケーラブルなシミュレータとして機能する可能性が示唆された。オープンで軽量なMedGemmaのようなモデルは、低リソース環境では特に有用であり、説明的なアウトプットは、臨床ワークフローにおける説明可能性と臨床的信頼を高める可能性がある。

論文の概要: Simulating Clinical AI Assistance using Multimodal LLMs: A Case Study in Diabetic Retinopathy

関連論文リスト