Fugu-MT 論文翻訳(概要): Haibu Mathematical-Medical Intelligent Agent:Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains

論文の概要: Haibu Mathematical-Medical Intelligent Agent:Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains

arxiv url: http://arxiv.org/abs/2510.07748v1
Date: Thu, 09 Oct 2025 03:35:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:14.849616
Title: Haibu Mathematical-Medical Intelligent Agent:Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains
Title（参考訳）: ハイブ数学・医学知能エージェント:検証型推論チェーンによる医療タスクにおける大規模言語モデルの信頼性向上
Authors: Yilun Zhang, Dexing Kong,
Abstract要約: LLM(Large Language Models)は医学における有望さを示すが、現実的および論理的誤りを生じやすい。 The Haibu Mathematical-Medical Intelligent Agent (MMIA)は、正式に検証可能な推論プロセスを通じて信頼性を確保する。 MMIAの「ブートストラップ」モードは、理論として検証された推論連鎖を記憶する
参考スコア（独自算出の注目度）: 4.198863375486898
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large Language Models (LLMs) show promise in medicine but are prone to factual and logical errors, which is unacceptable in this high-stakes field. To address this, we introduce the "Haibu Mathematical-Medical Intelligent Agent" (MMIA), an LLM-driven architecture that ensures reliability through a formally verifiable reasoning process. MMIA recursively breaks down complex medical tasks into atomic, evidence-based steps. This entire reasoning chain is then automatically audited for logical coherence and evidence traceability, similar to theorem proving. A key innovation is MMIA's "bootstrapping" mode, which stores validated reasoning chains as "theorems." Subsequent tasks can then be efficiently solved using Retrieval-Augmented Generation (RAG), shifting from costly first-principles reasoning to a low-cost verification model. We validated MMIA across four healthcare administration domains, including DRG/DIP audits and medical insurance adjudication, using expert-validated benchmarks. Results showed MMIA achieved an error detection rate exceeding 98% with a false positive rate below 1%, significantly outperforming baseline LLMs. Furthermore, the RAG matching mode is projected to reduce average processing costs by approximately 85% as the knowledge base matures. In conclusion, MMIA's verifiable reasoning framework is a significant step toward creating trustworthy, transparent, and cost-effective AI systems, making LLM technology viable for critical applications in medicine.
Abstract（参考訳）: 大規模言語モデル (LLM) は医学において有望であるが, 事実的, 論理的誤りを生じやすい。これを解決するために,正式に検証可能な推論プロセスを通じて信頼性を確保する LLM 駆動アーキテクチャである "Haibu Mathematical-Medical Intelligent Agent" (MMIA) を導入する。 MMIAは、複雑な医療タスクを原子的、エビデンスに基づくステップに再帰的に分解する。この全ての推論連鎖は、定理証明と同様、論理的コヒーレンスとエビデンストレーサビリティのために自動的に監査される。鍵となる革新はMMIAの「ブートストラッピング」モードであり、検証された推論連鎖を「理論」として保存する。その後のタスクはRetrieval-Augmented Generation (RAG)を使用して効率よく解決され、コストのかかる第一原理推論から低コストの検証モデルへとシフトする。我々は,専門家評価ベンチマークを用いて,DRG/DIP監査や医療保険調整を含む4つの医療行政分野のMMIAを検証した。その結果,MMIAは誤り検出率98%を超え,偽陽性率は1%以下であり,ベースラインLLMよりも有意に優れていた。さらに、知識ベースが成熟するにつれて、RAGマッチングモードは平均処理コストを約85%削減する。結論として、MMIAの検証可能な推論フレームワークは、信頼できる透明で費用対効果の高いAIシステムを構築するための重要なステップであり、LLM技術は医学における重要な応用に有効である。

論文の概要: Haibu Mathematical-Medical Intelligent Agent:Enhancing Large Language Model Reliability in Medical Tasks via Verifiable Reasoning Chains

関連論文リスト