Fugu-MT 論文翻訳(概要): Can we generate portable representations for clinical time series data using LLMs?

論文の概要: Can we generate portable representations for clinical time series data using LLMs?

arxiv url: http://arxiv.org/abs/2603.23987v1
Date: Wed, 25 Mar 2026 06:34:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.163479
Title: Can we generate portable representations for clinical time series data using LLMs?
Title（参考訳）: LLMを用いて臨床時系列データにポータブルな表現を生成することができるか?
Authors: Zongliang Ji, Yifei Sun, Andre Amaral, Anna Goldenberg, Rahul G. Krishnan,
Abstract要約: 本研究では,広範言語モデル (LLM) が患者への移植可能な埋め込み,すなわち患者の表現を創出するかどうかを考察する。当社のアプローチは単純で使いやすく,グリッド計算による流通と競合するものであることが分かりました。これらのポータブルな表現を用いることで、数ショットの学習が向上し、ベースラインに対する年齢や性別の人口的回復性が向上しないことがわかった。
参考スコア（独自算出の注目度）: 14.97461269508036
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deploying clinical ML is slow and brittle: models that work at one hospital often degrade under distribution shifts at the next. In this work, we study a simple question -- can large language models (LLMs) create portable patient embeddings i.e. representations of patients enable a downstream predictor built on one hospital to be used elsewhere with minimal-to-no retraining and fine-tuning. To do so, we map from irregular ICU time series onto concise natural language summaries using a frozen LLM, then embed each summary with a frozen text embedding model to obtain a fixed length vector capable of serving as input to a variety of downstream predictors. Across three cohorts (MIMIC-IV, HIRID, PPICU), on multiple clinically grounded forecasting and classification tasks, we find that our approach is simple, easy to use and competitive with in-distribution with grid imputation, self-supervised representation learning, and time series foundation models, while exhibiting smaller relative performance drops when transferring to new hospitals. We study the variation in performance across prompt design, with structured prompts being crucial to reducing the variance of the predictive models without altering mean accuracy. We find that using these portable representations improves few-shot learning and does not increase demographic recoverability of age or sex relative to baselines, suggesting little additional privacy risk. Our work points to the potential that LLMs hold as tools to enable the scalable deployment of production grade predictive models by reducing the engineering overhead.
Abstract（参考訳）: 臨床MLの展開は遅くて不安定で、ある病院で働くモデルは、次の病院での配布シフトで劣化することが多い。本研究では,ある病院に構築された下流の予測器を,最小限のトレーニングと微調整で他の病院で使用できるようにすることで,患者を移植可能な言語モデル(LLM)を作成することができるか,という簡単な疑問を考察する。そこで我々は,不規則なICU時系列から凍結LDMを用いて簡潔な自然言語要約にマッピングし,各要約を凍結テキスト埋め込みモデルに埋め込み,様々な下流予測器への入力として機能する固定長ベクトルを得る。 3つのコホート (MIMIC-IV, HIRID, PPICU) を複数臨床基盤とした予測・分類作業において, 提案手法は単純で, 使いやすく, グリッド計算, 自己教師付き表現学習, 時系列基礎モデルと競合し, 新規病院への転院時の相対的成績低下が小さい。本研究では,予測モデルのばらつきを平均精度を変化させることなく低減するためには,構造的プロンプトが不可欠である,即時設計における性能の変動について検討する。これらのポータブルな表現を使用することで、数ショットの学習が向上し、ベースラインに対する年齢や性別の人口的回復率が向上しないことがわかり、プライバシーリスクが少なくなることが示唆された。我々の研究は、LLMがエンジニアリングのオーバーヘッドを減らし、プロダクショングレードの予測モデルのスケーラブルなデプロイを可能にするツールとして持つ可能性を示している。

論文の概要: Can we generate portable representations for clinical time series data using LLMs?

関連論文リスト