Fugu-MT 論文翻訳(概要): PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

論文の概要: PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

arxiv url: http://arxiv.org/abs/2401.16355v3
Date: Wed, 20 Mar 2024 17:13:53 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-21 21:48:20.114537
Title: PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
Title（参考訳）: PathMMU: 病理の理解と推論のための大規模マルチモーダルエキスパートレベルベンチマーク
Authors: Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li, Xinheng Lyu, Tao Lin, Lin Yang,
Abstract要約: 大規模マルチモーダルモデル(LMM)のための,最大かつ高品質なエキスパート検証型病理診断ベンチマークPathMMUを紹介する。様々なソースから33,428のマルチモーダルなマルチチョイス質問と24,067のイメージで構成され、それぞれに正しい回答の説明が添えられている。 PathMMUの構築はGPT-4Vの高度な機能を活用し、3万以上の画像キャプチャペアを使用してキャプションを強化し、対応するQ&Aを生成する。
参考スコア（独自算出の注目度）: 14.944207181507135
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The emergence of large multimodal models has unlocked remarkable potential in AI, particularly in pathology. However, the lack of specialized, high-quality benchmark impeded their development and precise evaluation. To address this, we introduce PathMMU, the largest and highest-quality expert-validated pathology benchmark for Large Multimodal Models (LMMs). It comprises 33,428 multimodal multi-choice questions and 24,067 images from various sources, each accompanied by an explanation for the correct answer. The construction of PathMMU harnesses GPT-4V's advanced capabilities, utilizing over 30,000 image-caption pairs to enrich captions and generate corresponding Q&As in a cascading process. Significantly, to maximize PathMMU's authority, we invite seven pathologists to scrutinize each question under strict standards in PathMMU's validation and test sets, while simultaneously setting an expert-level performance benchmark for PathMMU. We conduct extensive evaluations, including zero-shot assessments of 14 open-sourced and 4 closed-sourced LMMs and their robustness to image corruption. We also fine-tune representative LMMs to assess their adaptability to PathMMU. The empirical findings indicate that advanced LMMs struggle with the challenging PathMMU benchmark, with the top-performing LMM, GPT-4V, achieving only a 49.8% zero-shot performance, significantly lower than the 71.8% demonstrated by human pathologists. After fine-tuning, significantly smaller open-sourced LMMs can outperform GPT-4V but still fall short of the expertise shown by pathologists. We hope that the PathMMU will offer valuable insights and foster the development of more specialized, next-generation LMMs for pathology.
Abstract（参考訳）: 大規模なマルチモーダルモデルの出現は、AI、特に病理学において顕著な可能性を解き放っている。しかし、専門的で高品質なベンチマークが欠如していることは、その開発と正確な評価を妨げた。そこで我々は,LMM(Large Multimodal Models)のための,最大かつ高品質なエキスパート検証型病理診断ベンチマークPathMMUを紹介する。様々なソースから33,428のマルチモーダルなマルチチョイス質問と24,067のイメージで構成され、それぞれに正しい回答の説明が添えられている。 PathMMUの構築はGPT-4Vの高度な機能を活用し、3万以上の画像キャプチャペアを使用してキャプションを強化し、カスケードプロセスで対応するQ&Aを生成する。 PathMMUの権威を最大限にするために、我々は7人の病理学者にPathMMUの検証とテストセットの厳格な基準の下で各質問を精査し、同時にPathMMUのエキスパートレベルのパフォーマンスベンチマークを設定します。 14のオープンソースと4のクローズドソースのLMMのゼロショット評価や,画像の破損に対する堅牢性など,幅広い評価を行っている。また、PathMMUへの適応性を評価するために、代表LMMを微調整する。実験の結果、先進的なLMMは挑戦的なPathMMUベンチマークに苦戦し、トップパフォーマンスのLMMであるGPT-4Vは、わずか49.8%のゼロショットのパフォーマンスしか達成せず、ヒトの病理学者による71.8%よりも大幅に低い結果となった。微調整の後、かなり小さなオープンソースLMMはGPT-4Vより優れているが、病理学者が示した専門知識には劣っている。私たちは、PathMMUが貴重な洞察を提供し、より専門的で次世代のLMMの開発を促進することを期待しています。

論文の概要: PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

関連論文リスト