Fugu-MT 論文翻訳(概要): From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding

論文の概要: From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding

arxiv url: http://arxiv.org/abs/2603.15270v1
Date: Mon, 16 Mar 2026 13:37:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 18:28:58.38513
Title: From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding
Title（参考訳）: 文書からスパンへ:LCMベースのICD符号化のためのコード中心学習
Authors: Xu Zhang, Wenxin Ma, Chenxu Wu, Rongsheng Wang, Kun Zhang, S. Kevin Zhou,
Abstract要約: 完全臨床文書からスケーラブルで短いエビデンスの範囲に監督を移行させるトレーニングフレームワークであるCode-Centric Learningを提案する。提案するフレームワークは,学習コストを大幅に削減し,見えないICD符号の精度を向上し,解釈可能性を維持する,混合学習戦略とコード中心データ拡張から構成される。
参考スコア（独自算出の注目度）: 29.729356191729888
License: http://creativecommons.org/licenses/by/4.0/
Abstract: ICD coding is a critical yet challenging task in healthcare. Recently, LLM-based methods demonstrate stronger generalization than discriminative methods in ICD coding. However, fine-tuning LLMs for ICD coding faces three major challenges. First, existing public ICD coding datasets provide limited coverage of the ICD code space, restricting a model's ability to generalize to unseen codes. Second, naive fine-tuning diminishes the interpretability of LLMs, as few public datasets contain explicit supporting evidence for assigned codes. Third, ICD coding typically involves long clinical documents, making fine-tuning LLMs computationally expensive. To address these issues, we propose Code-Centric Learning, a training framework that shifts supervision from full clinical documents to scalable, short evidence spans. The key idea of this framework is that span-level learning improves LLMs' ability to perform document-level ICD coding. Our proposed framework consists of a mixed training strategy and code-centric data expansion, which substantially reduces training cost, improves accuracy on unseen ICD codes and preserves interpretability. Under the same LLM backbone, our method substantially outperforms strong baselines. Notably, our method enables small-scale LLMs to achieve performance comparable to much larger proprietary models, demonstrating its effectiveness and potential for fully automated ICD coding.
Abstract（参考訳）: ICDコーディングは医療において重要な課題だが、難しい課題である。近年、LCMに基づく手法は、ICD符号化における差別的手法よりも強力な一般化を示している。しかし、ICD符号化のための微調整LDMは3つの大きな課題に直面している。まず、既存のパブリックなICDコーディングデータセットは、ICDコード空間の限られた範囲を提供し、モデルが見えないコードに一般化する能力を制限する。第二に、暗黙的な微調整は、割り当てられたコードに対する明確な支持証拠を含む公開データセットがほとんどないため、LLMの解釈可能性を減らす。第3に、ICD符号化は通常、長い臨床文書を伴い、微調整のLCMを計算的に高価にする。これらの問題に対処するために,完全臨床文書からスケーラブルで短い証拠に監督を移す訓練フレームワークであるCode-Centric Learningを提案する。このフレームワークの鍵となる考え方は、スパンレベルの学習は、文書レベルのICDコーディングを実行するLLMの能力を改善することである。提案するフレームワークは,学習コストを大幅に削減し,見えないICD符号の精度を向上し,解釈可能性を維持する,混合学習戦略とコード中心データ拡張から構成される。同じLDMバックボーンの下では,本手法は強いベースラインを著しく上回る。特に,本手法により,より大規模なプロプライエタリモデルに匹敵する性能を実現し,その有効性と,完全自動ICD符号化の可能性を示す。

論文の概要: From Documents to Spans: Code-Centric Learning for LLM-based ICD Coding

関連論文リスト