Fugu-MT 論文翻訳(概要): Retrieve, Then Classify: Corpus-Grounded Automation of Clinical Value Set Authoring

論文の概要: Retrieve, Then Classify: Corpus-Grounded Automation of Clinical Value Set Authoring

arxiv url: http://arxiv.org/abs/2604.14616v1
Date: Thu, 16 Apr 2026 04:57:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 21:29:31.726107
Title: Retrieve, Then Classify: Corpus-Grounded Automation of Clinical Value Set Authoring
Title（参考訳）: Retrieve, then Classification: Corpus-Grounded Automation of Clinical Value Set Authoring
Authors: Sumit Mukherjee, Juan Shu, Nairwita Mazumder, Tate Kernell, Celena Wheeler, Shannon Hastings, Chris Sidey-Gibbons,
Abstract要約: 臨床価値設定オーサリングは、臨床品質の測定と表現型化において繰り返し発生するボトルネックである。提案するRASC(Retrieval-Augmented Set Completion: Retrieval-Augmented Set Completion): キュレートされたコーパスから最もよく似た値集合を検索して候補プールを形成する。我々は,11,803個のVSAC値集合上でRASCの有用性を実証し,このタスクのための最初の大規模ベンチマークを構築した。
参考スコア（独自算出の注目度）: 1.3108798582758454
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical value set authoring -- the task of identifying all codes in a standardized vocabulary that define a clinical concept -- is a recurring bottleneck in clinical quality measurement and phenotyping. A natural approach is to prompt a large language model (LLM) to generate the required codes directly, but structured clinical vocabularies are large, version-controlled, and not reliably memorized during pretraining. We propose Retrieval-Augmented Set Completion (RASC): retrieve the $K$ most similar existing value sets from a curated corpus to form a candidate pool, then apply a classifier to each candidate code. Theoretically, retrieve-and-select can reduce statistical complexity by shrinking the effective output space from the full vocabulary to a much smaller retrieved candidate pool. We demonstrate the utility of RASC on 11,803 publicly available VSAC value sets, constructing the first large-scale benchmark for this task. A cross-encoder fine-tuned on SAPBert achieves AUROC~0.852 and value-set-level F1~0.298, outperforming a simpler three-layer Multilayer Perceptron (AUROC~0.799, F1~0.250) and both reduce the number of irrelevant candidates per true positive from 12.3 (retrieval-only) to approximately 3.2 and 4.4 respectively. Zero-shot GPT-4o achieves value-set-level F1~0.105, with 48.6\% of returned codes absent from VSAC entirely. This performance gap widens with increasing value set size, consistent with RASC's theoretical advantage. We observe similar performance gains across two other classifier model types, namely a cross-encoder initialized from pre-trained SAPBert and a LightGBM model, demonstrating that RASC's benefits extend beyond a single model class. The code to download and create the benchmark dataset, as well as the model training code is available at: \href{https://github.com/mukhes3/RASC}{https://github.com/mukhes3/RASC}.
Abstract（参考訳）: クリニカルバリューセットのオーサリング(クリニカルバリューセットオーサリング) — 臨床概念を定義する標準化語彙内のすべてのコードを特定するタスク — は、臨床品質測定と表現型化において、繰り返し発生するボトルネックである。自然なアプローチは、大きな言語モデル(LLM)に、必要なコードを直接生成するよう促すことであるが、構造化された臨床語彙は、大きく、バージョン管理され、事前訓練中に確実に記憶されない。提案するRASC(Retrieval-Augmented Set Completion)は、キュレートされたコーパスから最もよく似た値集合を検索して候補プールを生成し、各候補コードに分類器を適用する。理論的には、検索と選択は、有効出力空間を全語彙からより小さい候補プールに縮めることで、統計的複雑さを減少させることができる。我々は,11,803個のVSAC値集合上でRASCの有用性を実証し,このタスクのための最初の大規模ベンチマークを構築した。 SAPBert上で微調整されたクロスエンコーダは、AUROC~0.852と値セットレベルF1~0.298を達成し、より単純な3層多層パーセプトロン(AUROC~0.799, F1~0.250)を上回り、それぞれ12.3(検索のみ)から約3.2と4.4に減少する。ゼロショット GPT-4o は値セットレベル F1~0.105 を達成する。この性能ギャップは、RASCの理論的優位性と一致して、値セットのサイズが大きくなるにつれて拡大する。予備訓練されたSAPBert と LightGBM モデルから初期化したクロスエンコーダと,RASC の利点が単一モデルクラスを超えて拡張されていることを示す。ベンチマークデータセットをダウンロードして作成するコードとモデルトレーニングコードを以下に示す。

論文の概要: Retrieve, Then Classify: Corpus-Grounded Automation of Clinical Value Set Authoring

関連論文リスト