Fugu-MT 論文翻訳(概要): Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models

論文の概要: Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models

arxiv url: http://arxiv.org/abs/2604.05190v1
Date: Mon, 06 Apr 2026 21:37:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.502773
Title: Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models
Title（参考訳）: 臨床ナラティブと大規模言語モデルを用いた臨床治験の改善
Authors: Ziyi Chen, Mengxian Lyu, Cheng Peng, Yonghui Wu,
Abstract要約: 入院患者をスクリーニングすることは労働集約的なボトルネックであり、低入学と最終的に臨床試験の失敗につながる。近年の大規模言語モデル(LLM)のブレークスルーは、人工知能を使ってスクリーニングを改善する有望な機会を提供する。本研究は, エンコーダとデコーダを併用したジェネレーティブLLMを用いて, 臨床物語をスクリーニングし, 臨床トライアルの実施を促進することを目的として, 体系的に検討した。
参考スコア（独自算出の注目度）: 11.512193481146122
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Screening patients for enrollment is a well-known, labor-intensive bottleneck that leads to under-enrollment and, ultimately, trial failures. Recent breakthroughs in large language models (LLMs) offer a promising opportunity to use artificial intelligence to improve screening. This study systematically explored both encoder- and decoder-based generative LLMs for screening clinical narratives to facilitate clinical trial recruitment. We examined both general-purpose LLMs and medical-adapted LLMs and explored three strategies to alleviate the "Lost in the Middle" issue when handling long documents, including 1) Original long-context: using the default context windows of LLMs, 2) NER-based extractive summarization: converting the long document into summarizations using named entity recognition, 3) RAG: dynamic evidence retrieval based on eligibility criteria. The 2018 N2C2 Track 1 benchmark dataset is used for evaluation. Our experimental results show that the MedGemma model with the RAG strategy achieved the best micro-F1 score of 89.05%, outperforming other models. Generative LLMs have remarkably improved trial criteria that require long-term reasoning across long documents, whereas trial criteria that span a short piece of context (e.g., lab tests) show incremental improvements. The real-world adoption of LLMs for trial recruitment must consider specific criteria for selecting among rule-based queries, encoder-based LLMs, and generative LLMs to maximize efficiency within reasonable computing costs.
Abstract（参考訳）: 入院患者をスクリーニングすることは、よく知られた、労働集約的なボトルネックであり、低入学と究極的には臨床試験の失敗につながる。近年の大規模言語モデル(LLM)のブレークスルーは、人工知能を使ってスクリーニングを改善する有望な機会を提供する。本研究は, エンコーダとデコーダを併用したジェネレーティブLLMを用いて, 臨床物語をスクリーニングし, 臨床トライアルの実施を促進することを目的として, 体系的に検討した。汎用LSMと医療適応LSMの双方について検討し、長い文書の扱いにおいて「中途半端な」問題を緩和するための3つの方策を検討した。 1) 本来のロングコンテキスト: LLMのデフォルトコンテキストウィンドウを使用する。 2) NERに基づく抽出要約:長い文書を名前付き実体認識を用いて要約に変換する。 3) RAG: 資格基準に基づくダイナミックエビデンス検索。 2018 N2C2 Track 1ベンチマークデータセットが評価に使用されている。実験の結果,RAG戦略を用いたMedGemmaモデルでは,マイクロF1スコアが89.05%であり,他のモデルよりも優れていた。生成LDMは、長い文書にわたる長期的推論を必要とする試験基準を著しく改善したのに対し、短い文脈(実験室試験など)にまたがる試験基準は漸進的に改善されている。試行錯誤のためのLLMの現実的な採用は、合理的な計算コストで効率を最大化するために、ルールベースのクエリ、エンコーダベースのLLM、および生成LDMの中から選択する特定の基準を検討する必要がある。

論文の概要: Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models

関連論文リスト