Fugu-MT 論文翻訳(概要): MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation

論文の概要: MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation

arxiv url: http://arxiv.org/abs/2310.02520v2
Date: Thu, 5 Oct 2023 16:31:19 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-09 10:29:54.327349
Title: MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation
Title（参考訳）: MedDiffusion:拡散に基づくデータ拡張による健康リスク予測の促進
Authors: Yuan Zhong, Suhan Cui, Jiaqi Wang, Xiaochen Wang, Ziyi Yin, Yaqing Wang, Houping Xiao, Mengdi Huai, Ting Wang, Fenglong Ma
Abstract要約: 本稿では,MedDiffusion という,エンドツーエンドの拡散に基づくリスク予測モデルを提案する。トレーニング中に合成患者データを作成し、サンプルスペースを拡大することで、リスク予測性能を向上させる。ステップワイズ・アテンション・メカニズムを用いて患者の来訪者間の隠れた関係を識別し、高品質なデータを生成する上で最も重要な情報をモデルが自動的に保持することを可能にする。
参考スコア（独自算出の注目度）: 58.93221876843639
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Health risk prediction is one of the fundamental tasks under predictive modeling in the medical domain, which aims to forecast the potential health risks that patients may face in the future using their historical Electronic Health Records (EHR). Researchers have developed several risk prediction models to handle the unique challenges of EHR data, such as its sequential nature, high dimensionality, and inherent noise. These models have yielded impressive results. Nonetheless, a key issue undermining their effectiveness is data insufficiency. A variety of data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through the learning of underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. Furthermore, MedDiffusion discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data. Experimental evaluation on four real-world medical datasets demonstrates that MedDiffusion outperforms 14 cutting-edge baselines in terms of PR-AUC, F1, and Cohen's Kappa. We also conduct ablation studies and benchmark our model against GAN-based alternatives to further validate the rationality and adaptability of our model design. Additionally, we analyze generated data to offer fresh insights into the model's interpretability.
Abstract（参考訳）: 健康リスク予測(Health Risk Prediction)は、医療領域における予測モデルの基本課題の一つであり、患者が将来直面する可能性のある健康リスクを、電子健康記録(EHR)を用いて予測することを目的としている。研究者は、そのシーケンシャルな性質、高次元性、固有のノイズなど、EHRデータのユニークな課題を扱うために、いくつかのリスク予測モデルを開発した。これらのモデルは印象的な結果をもたらした。それでも、その効果を損なう重要な問題はデータ不足である。基礎となるデータ分布の学習を通じて、トレーニングデータセットのサイズを拡大することにより、この問題を軽減するために、さまざまなデータ生成および拡張手法が導入されている。しかし,これらの手法の性能はタスク非関連設計によって制限されることが多い。これらの欠点に対処するため,本研究では,MedDiffusion という新たな拡散に基づくリスク予測モデルを提案する。トレーニング中に合成患者データを作成してサンプル空間を拡大することにより、リスク予測性能を向上させる。さらにmeddiffusionは、ステップワイズ・アテンション(step-wise attention)機構を用いて、患者の訪問間の隠れた関係を識別し、高品質データを生成する上で最も重要な情報を自動保持する。 4つの実世界の医療データセットに対する実験的評価は、MedDiffusionがPR-AUC、F1、Cohen's Kappaで14の最先端ベースラインを上回っていることを示している。また、モデル設計の合理性と適応性をさらに検証するため、GANベースの代替案に対してアブレーション研究を行い、モデルをベンチマークする。さらに,生成されたデータを分析し,モデルの解釈可能性に関する新たな洞察を提供する。

論文の概要: MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation

関連論文リスト