Fugu-MT 論文翻訳(概要): InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning

論文の概要: InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning

arxiv url: http://arxiv.org/abs/2509.22261v1
Date: Fri, 26 Sep 2025 12:26:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.415277
Title: InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning
Title（参考訳）: InfiMed-Foundation:Compute-Efficient Pre-TrainingとMulti-Stage Fine-Tuningを用いた先進的マルチモーダル医療モデルのパイオニアリング
Authors: Guanghao Zhu, Zhitian Hou, Zeyu Liu, Zhijie Sang, Congkai Xie, Hongxia Yang,
Abstract要約: InfiMed-Foundation-1.7BとInfiMed-Foundation-4Bは、医療応用における最先端のパフォーマンスを実現するために設計された2つの医療用MLLMである。我々は、トレーニング効率を高めるために、低解像度画像分解能とマルチモーダルシーケンスパッキングを用いる。 InfiMed-Foundation-1.7BはQwen2.5VL-3Bを上回っ、InfiMed-Foundation-4BはHuatuoGPT-V-7BとMedGemma-27B-ITを上回っている。
参考スコア（独自算出の注目度）: 19.791150694039466
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal large language models (MLLMs) have shown remarkable potential in various domains, yet their application in the medical field is hindered by several challenges. General-purpose MLLMs often lack the specialized knowledge required for medical tasks, leading to uncertain or hallucinatory responses. Knowledge distillation from advanced models struggles to capture domain-specific expertise in radiology and pharmacology. Additionally, the computational cost of continual pretraining with large-scale medical data poses significant efficiency challenges. To address these issues, we propose InfiMed-Foundation-1.7B and InfiMed-Foundation-4B, two medical-specific MLLMs designed to deliver state-of-the-art performance in medical applications. We combined high-quality general-purpose and medical multimodal data and proposed a novel five-dimensional quality assessment framework to curate high-quality multimodal medical datasets. We employ low-to-high image resolution and multimodal sequence packing to enhance training efficiency, enabling the integration of extensive medical data. Furthermore, a three-stage supervised fine-tuning process ensures effective knowledge extraction for complex medical tasks. Evaluated on the MedEvalKit framework, InfiMed-Foundation-1.7B outperforms Qwen2.5VL-3B, while InfiMed-Foundation-4B surpasses HuatuoGPT-V-7B and MedGemma-27B-IT, demonstrating superior performance in medical visual question answering and diagnostic tasks. By addressing key challenges in data quality, training efficiency, and domain-specific knowledge extraction, our work paves the way for more reliable and effective AI-driven solutions in healthcare. InfiMed-Foundation-4B model is available at \href{https://huggingface.co/InfiX-ai/InfiMed-Foundation-4B}{InfiMed-Foundation-4B}.
Abstract（参考訳）: MLLM(Multimodal large language model)は、様々な領域において顕著な可能性を示しているが、医療分野への応用にはいくつかの課題がある。汎用MLLMは医療業務に必要な専門知識を欠くことが多く、不確かさや幻覚の反応をもたらす。先進的なモデルからの知識の蒸留は、放射線学と薬理学の分野固有の専門知識をつかむのに苦労する。さらに、大規模医療データによる継続事前トレーニングの計算コストは、大きな効率上の課題を生じさせる。これらの課題に対処するため,医療応用における最先端のパフォーマンスを実現するために,InfiMed-Foundation-1.7BとInfiMed-Foundation-4Bを提案する。我々は,高品質な汎用医療用マルチモーダルデータと医療用マルチモーダルデータを組み合わせて,高品質なマルチモーダル医療データセットをキュレートするための新しい5次元品質評価フレームワークを提案した。我々は、訓練効率を高めるために、低解像度画像分解能とマルチモーダルシーケンスパッキングを使用し、広範囲な医療データの統合を可能にした。さらに、3段階の教師付き微調整プロセスにより、複雑な医療タスクに対する効果的な知識抽出が保証される。 MedEvalKitフレームワーク上での評価では、InfiMed-Foundation-1.7BはQwen2.5VL-3Bを上回り、InfiMed-Foundation-4BはHuatuoGPT-V-7BとMedGemma-27B-ITを上回り、医学的な視覚的質問応答と診断タスクにおいて優れたパフォーマンスを示す。データ品質、トレーニング効率、ドメイン固有の知識抽出における重要な課題に対処することで、私たちの仕事は、医療におけるより信頼性が高く効果的なAI駆動ソリューションの道を開いたのです。 InfiMed-Foundation-4B モデルは \href{https://huggingface.co/InfiX-ai/InfiMed-Foundation-4B}{InfiMed-Foundation-4B} で利用可能である。

論文の概要: InfiMed-Foundation: Pioneering Advanced Multimodal Medical Models with Compute-Efficient Pre-Training and Multi-Stage Fine-Tuning

関連論文リスト