Fugu-MT 論文翻訳(概要): MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction

論文の概要: MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction

arxiv url: http://arxiv.org/abs/2509.23368v1
Date: Sat, 27 Sep 2025 15:30:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.187197
Title: MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction
Title（参考訳）: MedCritical: 自己協調的補正による小言語モデルにおける医療推論の強化
Authors: Xinchun Su, Chunxu Luo, Yixuan Li, Weidong Yang, Lipeng Ma,
Abstract要約: 小さな言語モデルは、GPT-4やDeepseekのような大きな言語モデルに比べて性能が劣ることが多い。近年の知識蒸留法は,教師指導による誤り訂正によってこれらの問題に対処することを目的としている。そこで我々は,大規模教師モデルによって微調整された小言語モデルを用いた2段階のフレームワーク,MedCriticalを提案する。
参考スコア（独自算出の注目度）: 22.35140929464229
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the field of medicine, complex reasoning tasks such as clinical diagnosis, treatment planning, and medical knowledge integration pose significant challenges, where small language models often underperform compared to large language models like GPT-4 and Deepseek. Recent knowledge distillation-based methods aim to address these issues through teacher-guided error correction, but this LLM as judge approach remains challenging in terms of cost, time, and efficiency. To circumvent this issue, we propose a novel two-stage framework, MedCritical, which uses a small language model fine-tuned by a large teacher model to play against itself. In the first stage, we extract high-level and detailed long-chain thought templates from the teacher model to guide the student model to generate more complex reasoning thoughts. In the second stage, we introduce direct preference optimization (DPO) through model self-iteration collaboration to enhance the reasoning ability of the student model by playing against the correction trajectory of the fine-tuned model during training. This model self-learning DPO approach teaches the student model to use its own error-driven insights to consolidate its skills and knowledge to solve complex problems, and achieves comparable results to traditional knowledge distillation methods using teacher models at a lower cost. Notably, our MedCritical 7B model outperforms the Taiyi and Huatuo-o1-7B models by 3.04\% and 10.12\% respectively on the CMExam benchmark, achieving new SOTA performance among 7B-class small models.
Abstract（参考訳）: 医学の分野では、臨床診断、治療計画、医療知識の統合といった複雑な推論タスクは、小さな言語モデルがGPT-4やDeepseekのような大きな言語モデルと比較すると、しばしば性能が劣る、重大な課題を引き起こす。近年の知識蒸留法は, 教師指導による誤り訂正によってこれらの問題に対処することを目的としているが, このLCMの判断アプローチは, コスト, 時間, 効率の観点からも困難である。この問題を回避するために,大規模な教師モデルによって微調整された小さな言語モデルを用いた,新たな2段階フレームワークであるMedCriticalを提案する。第1段階では,教師モデルから高レベルかつ詳細な長鎖思考テンプレートを抽出し,より複雑な推論思考を生成する。第2段階では、訓練中の微調整モデルの修正軌跡に逆らって、学生モデルの推論能力を高めるために、モデル自己選好協調による直接選好最適化(DPO)を導入する。この自己学習型DPOアプローチは、学生モデルに対して、複雑な問題を解決するためのスキルと知識を統合するために、独自のエラー駆動の洞察を使うことを教え、より低コストで教師モデルを用いた伝統的な知識蒸留法と同等の結果を得る。特に、我々のMedCritical 7Bモデルは、CMExamベンチマークでそれぞれ、Taiyi と Huatuo-o1-7B モデルより 3.04\% と 10.12\% 向上し、7B級の小型モデルでSOTA 性能が向上した。

論文の概要: MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction

関連論文リスト