Fugu-MT 論文翻訳(概要): MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs

論文の概要: MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs

arxiv url: http://arxiv.org/abs/2510.25867v1
Date: Wed, 29 Oct 2025 18:10:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 16:05:09.53164
Title: MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
Title（参考訳）: MedVLSynther: ジェネレータを用いた医用文書からの高品質視覚質問応答の合成
Authors: Xiaoke Huang, Ningsen Wang, Hui Liu, Xianfeng Tang, Yuyin Zhou,
Abstract要約: 本稿では,バイオメディカル文献から直接高品質なVQAアイテムを合成するルーリック誘導型ジェネレータフレームワークであるMedVL Syntherを紹介する。 PubMed Centralへのパイプラインでは、MedSynVQA: 13,087が13の画像モダリティと28の解剖学的領域にまたがる14,803の画像を監査した。検証可能な報酬を用いた強化学習によるオープンウェイトLMMのトレーニングは、6つのVQAベンチマークで精度を向上させる。
参考スコア（独自算出の注目度）: 31.549507149494904
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Multimodal Models (LMMs) are increasingly capable of answering medical questions that require joint reasoning over images and text, yet training general medical VQA systems is impeded by the lack of large, openly usable, high-quality corpora. We present MedVLSynther, a rubric-guided generator-verifier framework that synthesizes high-quality multiple-choice VQA items directly from open biomedical literature by conditioning on figures, captions, and in-text references. The generator produces self-contained stems and parallel, mutually exclusive options under a machine-checkable JSON schema; a multi-stage verifier enforces essential gates (self-containment, single correct answer, clinical validity, image-text consistency), awards fine-grained positive points, and penalizes common failure modes before acceptance. Applying this pipeline to PubMed Central yields MedSynVQA: 13,087 audited questions over 14,803 images spanning 13 imaging modalities and 28 anatomical regions. Training open-weight LMMs with reinforcement learning using verifiable rewards improves accuracy across six medical VQA benchmarks, achieving averages of 55.85 (3B) and 58.15 (7B), with up to 77.57 on VQA-RAD and 67.76 on PathVQA, outperforming strong medical LMMs. A Ablations verify that both generation and verification are necessary and that more verified data consistently helps, and a targeted contamination analysis detects no leakage from evaluation suites. By operating entirely on open literature and open-weight models, MedVLSynther offers an auditable, reproducible, and privacy-preserving path to scalable medical VQA training data.
Abstract（参考訳）: 大規模マルチモーダルモデル(LMM)は画像やテキストに対する共同推論を必要とする医学的疑問に答える能力がますます高まっているが、一般医療用VQAシステムの訓練は、大規模でオープンに使用可能な高品質なコーパスの欠如によって妨げられている。 MedVLSyntherは、図形、キャプション、テキスト中の参照を条件に、オープンなバイオメディカル文献から直接高品質なVQAアイテムを合成する、ルーリック誘導型ジェネレータ検証フレームワークである。多段検証器は、本質的なゲート(自己完結、単一正解、臨床的妥当性、画像テキスト整合性)を強制し、きめ細かい正の点を付与し、受け入れ前に共通の障害モードを罰する。このパイプラインをPubMed Centralに適用すると、MedSynVQA: 13,087は、13の画像モダリティと28の解剖学的領域にまたがる14,803のイメージに対して、質問を監査した。検証可能な報酬を用いた強化学習によるオープンウェイト LMM の訓練は、6つの医療用 VQA ベンチマークで平均 55.85 (3B) と 58.15 (7B) を達成し、パスVQA では 77.57 、パスVQA では 67.76 を達成し、強力な医療用 LMM を上回っている。 Ablationsは、生成と検証の両方が必要であり、より検証されたデータが一貫して有効であることを検証し、ターゲットの汚染分析は評価スイートからの漏洩を検知しない。 MedVLSyntherは完全にオープン文学とオープンウェイトモデルで運用することで、スケーラブルな医療用VQAトレーニングデータへの監査可能、再現可能、およびプライバシ保護パスを提供する。

論文の概要: MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs

関連論文リスト