Fugu-MT 論文翻訳(概要): Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications

論文の概要: Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications

arxiv url: http://arxiv.org/abs/2509.08604v1
Date: Wed, 10 Sep 2025 14:02:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-11 15:16:52.444304
Title: Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications
Title（参考訳）: 医学における大規模言語モデルの記憶 : 頻度,特徴,含意
Authors: Anran Li, Lingfei Qian, Mengmeng Du, Yu Yin, Yan Hu, Zihao Sun, Yihang Fu, Erica Stutz, Xuguang Ai, Qianqian Xie, Rui Zhu, Jimin Huang, Yifan Yang, Siru Liu, Yih-Chung Tham, Lucila Ohno-Machado, Hyunghoon Cho, Zhiyong Lu, Hua Xu, Qingyu Chen,
Abstract要約: LLM(Large Language Models)は医学において大きな可能性を証明している。医学におけるLSMの記憶の包括的評価について紹介する。この結果は,すべての適応シナリオで記憶化が普及し,一般ドメインで報告されるよりもはるかに高いことを示す。
参考スコア（独自算出の注目度）: 42.69954853399731
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated significant potential in medicine. To date, LLMs have been widely applied to tasks such as diagnostic assistance, medical question answering, and clinical information synthesis. However, a key open question remains: to what extent do LLMs memorize medical training data. In this study, we present the first comprehensive evaluation of memorization of LLMs in medicine, assessing its prevalence (how frequently it occurs), characteristics (what is memorized), volume (how much content is memorized), and potential downstream impacts (how memorization may affect medical applications). We systematically analyze common adaptation scenarios: (1) continued pretraining on medical corpora, (2) fine-tuning on standard medical benchmarks, and (3) fine-tuning on real-world clinical data, including over 13,000 unique inpatient records from Yale New Haven Health System. The results demonstrate that memorization is prevalent across all adaptation scenarios and significantly higher than reported in the general domain. Memorization affects both the development and adoption of LLMs in medicine and can be categorized into three types: beneficial (e.g., accurate recall of clinical guidelines and biomedical references), uninformative (e.g., repeated disclaimers or templated medical document language), and harmful (e.g., regeneration of dataset-specific or sensitive clinical content). Based on these findings, we offer practical recommendations to facilitate beneficial memorization that enhances domain-specific reasoning and factual accuracy, minimize uninformative memorization to promote deeper learning beyond surface-level patterns, and mitigate harmful memorization to prevent the leakage of sensitive or identifiable patient information.
Abstract（参考訳）: LLM(Large Language Models)は医学において大きな可能性を証明している。これまで、LSMは診断支援、医療質問応答、臨床情報合成といったタスクに広く適用されてきた。しかし、LLMが医療訓練データをどの程度記憶しているかという重要な疑問が残る。本研究では,医学におけるLDMの記憶の包括的評価を行い,その頻度(頻度),特徴(記憶の量),容積(記憶の量),下流への影響(記憶が医学的応用に与える影響)について検討した。 1)医療コーパスのトレーニング継続,(2)標準医療ベンチマークの微調整,(3)イェール・ニューヘイブン・ヘルス・システムからの13,000件以上のユニークな入院記録を含む実世界の臨床データの微調整など,一般的な適応シナリオを体系的に分析した。この結果は,すべての適応シナリオで記憶化が普及し,一般ドメインで報告されるよりもはるかに高いことを示す。覚書化は医学におけるLSMの開発と採用の両方に影響を及ぼし、有益(例えば、臨床ガイドラインの正確なリコールとバイオメディカル参照)、非形式的(例えば、反復的な宣言やテンプレート化された医療文書言語)、有害(例えば、データセット特異的または機密性のある臨床内容の再生)の3種類に分類される。これらの知見に基づき、我々は、ドメイン固有の推論と事実的正確性を高める有益な記憶の促進、非形式的記憶の最小化、表面レベルのパターンを超えた深層学習の促進、有害な記憶の緩和、機密性や特定性のある患者情報の漏洩防止を実践的に推奨する。

論文の概要: Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications

関連論文リスト