Fugu-MT 論文翻訳(概要): AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration

論文の概要: AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration

arxiv url: http://arxiv.org/abs/2509.24560v1
Date: Mon, 29 Sep 2025 10:13:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.917328
Title: AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration
Title（参考訳）: AdaThink-Med:不確実性ガイド長校正による医学的適応的思考
Authors: Shaohao Rui, Kaitao Chen, Weijie Ma, Xiaosong Wang,
Abstract要約: 本稿では,医療推論モデルにおける適応的思考能力の向上を目的としたエンドツーエンドフレームワークであるAdaThink-Medを提案する。 AdaThink-Med はまず各質問に対して複数の候補出力を生成し、各候補の正しさと不確実性を評価し、不確実性誘導長キャリブレーションモジュールを用いて問題を推定する。 6つの公開医療QAベンチマークでは、AdaThink-Medは最小限の劣化で性能を維持しながら平均6.4倍の長さの縮小を実現している。
参考スコア（独自算出の注目度）: 4.33177021777927
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in inference time scaling with extended long chain-of thought have significantly improved the reasoning capabilities of both general and medical large language models (LLMs). However, these models tend to engage in lengthy reasoning processes regardless of the difficulty of the input question, leading to increased inference costs in real-world applications. Therefore, enabling adaptive thinking where models think less for simpler questions and think more for complex ones is critical for the effective use of medical LLMs in practice. Despite its importance, there is a lack of end-to-end approaches designed to enhance the adaptive thinking capabilities of medical LLMs while providing a comprehensive examination of the trade-off between performance and computational cost. To bridge this gap, we propose AdaThink-Med, the first end-to-end framework designed to enhance adaptive thinking ability in medical reasoning models with uncertainty-guided length calibration. AdaThink-Med first generates multiple candidate outputs for each question, evaluates the correctness and uncertainty of each candidate, and then estimates problem difficulty via an uncertainty-guided length calibration module. For outputs with low difficulty and correct answers, the framework penalizes longer reasoning paths; whereas for those with high difficulty and incorrect answers, it encourages extending the chain of thought to explore alternative solutions. On six public medical QA benchmarks, AdaThink-Med achieves up to 6.4x length reduction on average while retaining performance with only minimal degradation. Intriguingly, we observe that AdaThink-Med spontaneously develops two distinct reasoning modes, which we characterize as "non-thinking" and "thinking", demonstrating the model's ability to suppress redundant reasoning processes dynamically.
Abstract（参考訳）: 長鎖思考による推論時間スケーリングの最近の進歩は、一般および医療用大言語モデル(LLM)の推論能力を大幅に向上させた。しかし、これらのモデルは入力問題の難しさに関わらず、長い推論プロセスに携わる傾向にあり、現実のアプリケーションでは推論コストが増大する。したがって、モデルがより単純な質問に対して考えることが少なく、複雑な質問に対して考えることの少ない適応的思考を可能にすることは、実際に医療用LLMを効果的に活用するために重要である。その重要性にもかかわらず、医療用LCMの適応的思考能力を高めるために設計されたエンドツーエンドのアプローチは欠如しており、性能と計算コストのトレードオフを包括的に検証している。このギャップを埋めるために,不確実な長さ校正を含む医学推論モデルにおける適応的思考能力を高めるために設計された,最初のエンドツーエンドフレームワークであるAdaThink-Medを提案する。 AdaThink-Med はまず各質問に対して複数の候補出力を生成し、各候補の正しさと不確実性を評価し、不確実性誘導長キャリブレーションモジュールを用いて問題を推定する。難易度と正解率の低いアウトプットに対して、このフレームワークはより長い推論パスを罰するが、難易度と誤答率が高い場合には、思考の連鎖を拡張して代替ソリューションを探究する。 6つの公開医療QAベンチマークでは、AdaThink-Medは最小限の劣化で性能を維持しながら平均6.4倍の長さの縮小を実現している。興味深いことに、AdaThink-Medは2つの異なる推論モードを自発的に発達させ、それを「非思考」と「思考」と呼び、モデルが冗長な推論過程を動的に抑制する能力を示す。

論文の概要: AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration

関連論文リスト