Collaborative Synthesis of Patient Records through Multi-Visit Health
State Inference
- URL: http://arxiv.org/abs/2312.14646v1
- Date: Fri, 22 Dec 2023 12:28:29 GMT
- Title: Collaborative Synthesis of Patient Records through Multi-Visit Health
State Inference
- Authors: Hongda Sun, Hongzhan Lin, Rui Yan
- Abstract summary: We propose MSIC, a multi-visit health Status Inference model for Collaborative EHR synthesis.
We formulate the synthetic EHR generation process as a probabilistic graphical model.
We derive a health state inference method tailored for the multi-visit scenario to effectively utilize previous records to synthesize current and future records.
- Score: 25.121296198656758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic health records (EHRs) have become the foundation of machine
learning applications in healthcare, while the utility of real patient records
is often limited by privacy and security concerns. Synthetic EHR generation
provides an additional perspective to compensate for this limitation. Most
existing methods synthesize new records based on real EHR data, without
consideration of different types of events in EHR data, which cannot control
the event combinations in line with medical common sense. In this paper, we
propose MSIC, a Multi-visit health Status Inference model for Collaborative EHR
synthesis to address these limitations. First, we formulate the synthetic EHR
generation process as a probabilistic graphical model and tightly connect
different types of events by modeling the latent health states. Then, we derive
a health state inference method tailored for the multi-visit scenario to
effectively utilize previous records to synthesize current and future records.
Furthermore, we propose to generate medical reports to add textual descriptions
for each medical event, providing broader applications for synthesized EHR
data. For generating different paragraphs in each visit, we incorporate a
multi-generator deliberation framework to collaborate the message passing of
multiple generators and employ a two-phase decoding strategy to generate
high-quality reports. Our extensive experiments on the widely used benchmarks,
MIMIC-III and MIMIC-IV, demonstrate that MSIC advances state-of-the-art results
on the quality of synthetic data while maintaining low privacy risks.
Related papers
- Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing [10.390646796231438]
We introduce RawMed, the first framework to synthesize multi-table, time-series EHR data that closely resembles raw EHRs.<n>Using text-based representation and compression techniques, RawMed captures complex structures and temporal dynamics with minimal preprocessing.<n>We also propose a new evaluation framework for multi-table time-series synthetic EHRs, assessing distributional similarity, inter-table relationships, temporal dynamics, and privacy.
arXiv Detail & Related papers (2025-07-09T16:22:22Z) - A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs [1.1645633237702129]
We evaluate the current state of commercial Large Language Models for generating synthetic data.
Our main finding is that while LLMs can reliably generate synthetic health records for smaller subsets of features, they struggle to preserve realistic distributions and correlations as the dimensionality of the data increases.
arXiv Detail & Related papers (2025-04-20T15:37:05Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.
Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.
Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.
Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation [22.908801443059758]
We present MedCoDi-M, a model for multimodal medical data generation.
We benchmark it against five competitors on the MIMIC-CXR dataset.
We assess the utility of MedCoDi-M in addressing key challenges in the medical field.
arXiv Detail & Related papers (2025-01-08T16:53:56Z) - HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation [89.3260120072177]
We propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for Radiology report generation.
Our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression.
Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models.
arXiv Detail & Related papers (2024-12-15T06:04:16Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling [22.94521527609479]
EMERGE is a Retrieval-Augmented Generation driven framework aimed at enhancing multimodal EHR predictive modeling.
Our approach extracts entities from both time-series data and clinical notes by prompting Large Language Models.
The extracted knowledge is then used to generate task-relevant summaries of patients' health statuses.
arXiv Detail & Related papers (2024-05-27T10:53:15Z) - Multimodal Fusion of EHR in Structures and Semantics: Integrating Clinical Records and Notes with Hypergraph and LLM [39.25272553560425]
We propose a new framework called MINGLE, which integrates both structures and semantics in EHR effectively.
Our framework uses a two-level infusion strategy to combine medical concept semantics and clinical note semantics into hypergraph neural networks.
Experiment results on two EHR datasets, the public MIMIC-III and private CRADLE, show that MINGLE can effectively improve predictive performance by 11.83% relatively.
arXiv Detail & Related papers (2024-02-19T23:48:40Z) - Next Visit Diagnosis Prediction via Medical Code-Centric Multimodal Contrastive EHR Modelling with Hierarchical Regularisation [0.0]
We propose NECHO, a novel medical code-centric multimodal contrastive EHR learning framework with hierarchical regularisation.
First, we integrate multifaceted information encompassing medical codes, demographics, and clinical notes using a tailored network design.
We also regularise modality-specific encoders using a parental level information in medical ontology to learn hierarchical structure of EHR data.
arXiv Detail & Related papers (2024-01-22T01:58:32Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Foresight -- Deep Generative Modelling of Patient Timelines using
Electronic Health Records [46.024501445093755]
Temporal modelling of medical history can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications.
We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts.
arXiv Detail & Related papers (2022-12-13T19:06:00Z) - A Multifaceted Benchmarking of Synthetic Electronic Health Record
Generation Models [15.165156674288623]
We introduce a generalizable benchmarking framework to appraise key characteristics of synthetic health data.
Results show that there is a utility-privacy tradeoff for sharing synthetic EHR data.
arXiv Detail & Related papers (2022-08-02T03:44:45Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.