Fugu-MT 論文翻訳(概要): LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

論文の概要: LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

arxiv url: http://arxiv.org/abs/2509.25620v1
Date: Tue, 30 Sep 2025 00:29:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:44:59.968314
Title: LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology
Title（参考訳）: LMOD+:眼科におけるマルチモーダル大言語モデルの開発と評価のための総合的マルチモーダルデータセットとベンチマーク
Authors: Zhenyue Qin, Yang Liu, Yu Yin, Jinyu Ding, Haoran Zhang, Anran Li, Dylan Campbell, Xuansheng Wu, Ke Zou, Tiarnan D. L. Keenan, Emily Y. Chew, Zhiyong Lu, Yih-Chung Tham, Ninghao Liu, Xiuzhen Zhang, Qingyu Chen,
Abstract要約: 視力低下の眼疾患は、労働力不足によるタイムリーな診断と専門医療へのアクセス制限により、世界的な健康上の重荷となる。本報告では,12の眼球運動条件に共通する多粒性アノテーションと5つの画像モダリティを併用した32,633例の大規模マルチモーダル眼科ベンチマークを報告する。このデータセットは、画像、解剖学的構造、人口統計、自由テキストアノテーションを統合し、解剖学的構造認識、疾患スクリーニング、疾患ステージング、およびバイアス評価のための人口統計予測をサポートする。
参考スコア（独自算出の注目度）: 43.092364533480456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-threatening eye diseases pose a major global health burden, with timely diagnosis limited by workforce shortages and restricted access to specialized care. While multimodal large language models (MLLMs) show promise for medical image interpretation, advancing MLLMs for ophthalmology is hindered by the lack of comprehensive benchmark datasets suitable for evaluating generative models. We present a large-scale multimodal ophthalmology benchmark comprising 32,633 instances with multi-granular annotations across 12 common ophthalmic conditions and 5 imaging modalities. The dataset integrates imaging, anatomical structures, demographics, and free-text annotations, supporting anatomical structure recognition, disease screening, disease staging, and demographic prediction for bias evaluation. This work extends our preliminary LMOD benchmark with three major enhancements: (1) nearly 50% dataset expansion with substantial enlargement of color fundus photography; (2) broadened task coverage including binary disease diagnosis, multi-class diagnosis, severity classification with international grading standards, and demographic prediction; and (3) systematic evaluation of 24 state-of-the-art MLLMs. Our evaluations reveal both promise and limitations. Top-performing models achieved ~58% accuracy in disease screening under zero-shot settings, and performance remained suboptimal for challenging tasks like disease staging. We will publicly release the dataset, curation pipeline, and leaderboard to potentially advance ophthalmic AI applications and reduce the global burden of vision-threatening diseases.
Abstract（参考訳）: 視力低下の眼疾患は、労働力不足によるタイムリーな診断や専門医療へのアクセス制限など、世界的な健康上の重荷となる。マルチモーダル大言語モデル(MLLM)は医用画像解釈を約束するが、眼科におけるMLLMの進歩は、生成モデルを評価するのに適した包括的なベンチマークデータセットの欠如によって妨げられる。本報告では,12の眼球運動条件に共通する多粒性アノテーションと5つの画像モダリティを併用した32,633例の大規模マルチモーダル眼科ベンチマークを報告する。このデータセットは、画像、解剖学的構造、人口統計、自由テキストアノテーションを統合し、解剖学的構造認識、疾患スクリーニング、疾患ステージング、およびバイアス評価のための人口統計予測をサポートする。本研究は,(1)カラーファンドス撮影の大幅な拡大を伴う50%近いデータセット拡張,(2)バイナリ疾患診断,多クラス診断,国際グレーディング標準による重症度分類,および人口統計予測を含むタスクカバレッジの拡大,(3)最先端MLLM24の体系的評価,の3つの主要な拡張とともに,LMODベンチマークを拡張した。私たちの評価は約束と限界の両方を明らかにします。トップパフォーマンスモデルはゼロショット設定で疾患スクリーニングにおいて約58%の精度を達成した。私たちはこのデータセット、キュレーションパイプライン、およびリーダーボードを公開し、眼科のAI応用を前進させ、視覚障害の世界的な負担を軽減します。

論文の概要: LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology

関連論文リスト