Patient-Zero: A Unified Framework for Real-Record-Free Patient Agent Generation
- URL: http://arxiv.org/abs/2509.11078v1
- Date: Sun, 14 Sep 2025 03:56:00 GMT
- Title: Patient-Zero: A Unified Framework for Real-Record-Free Patient Agent Generation
- Authors: Yunghwei Lai, Weizhi Ma, Yang Liu,
- Abstract summary: We propose a realistic patient generation framework, Patient-Zero, which requires no real medical records.<n>Patient-Zero first introduces a medically-aligned multi-step generation architecture, which builds comprehensive patient records through hierarchical medical knowledge injection without real medical records.<n>Our framework enables the generation of contextually diverse patient records while maintaining strict medical coherence, supported by adaptive dialogue strategies and real-time clinical plausibility verification.
- Score: 11.75912414451272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthetic data generation using large language models (LLMs) has emerged as a promising solution across various domains, particularly in medical field, to mitigate data collection challenges. However, existing studies mainly utilize LLMs to rewrite and complete existing medical records, where the limitations in data privacy, accuracy, and diversity sill exist, and additionally lack the ability to interact like real patients. To address these issues, we propose a realistic patient generation framework, Patient-Zero, which requires no real medical records. Patient-Zero first introduces a medically-aligned multi-step generation architecture, which builds comprehensive patient records through hierarchical medical knowledge injection without real medical records. Then, to optimize the virtual patient's interaction abilities with humans, Patient-Zero designs a dynamic updating mechanism to improve the consistency and conversational performance. Our framework enables the generation of contextually diverse patient records while maintaining strict medical coherence, supported by adaptive dialogue strategies and real-time clinical plausibility verification. Experimental results demonstrate that our model achieves good performance in accuracy, diversity, and consistency. After training with our generated virtual patients, existing models show significant improvements on the MedQA dataset.
Related papers
- A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z) - Multi-Stage Patient Role-Playing Framework for Realistic Clinical Interactions [2.1897719729390173]
We propose the first Chinese patient simulation dataset (Ch-PatientSim)<n>Patients are simulated based on a five-dimensional persona structure.<n>To address issues of the persona class imbalance, a portion of the dataset is augmented using few-shot generation, followed by manual verification.
arXiv Detail & Related papers (2026-01-16T02:34:22Z) - CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction [24.569877750738286]
We present CURENet, a multimodal model that integrates unstructured clinical notes, lab tests, and patients' time-series data.<n>CURENet has been capable of capturing the intricate interaction between different forms of clinical data and creating a more reliable predictive model for chronic illnesses.
arXiv Detail & Related papers (2025-11-14T15:52:22Z) - PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions [15.272979678875787]
We introduce PatientSim, a patient simulator that generates realistic and diverse patient personas for clinical scenarios.<n>PatientSim operates using: 1) clinical profiles, including symptoms and medical history, derived from real-world data in the MIMIC-ED and MIMIC-IV datasets, and 2) personas defined by four axes: personality, language proficiency, medical history recall level, and cognitive confusion level.<n>The top-performing open-source model, Llama 3.3, was validated by four clinicians to confirm the robustness of our framework.
arXiv Detail & Related papers (2025-05-23T12:34:48Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering [51.26412822853409]
We present a novel personalized federated learning (pFL) method for medical visual question answering (VQA) models.
Our method introduces learnable prompts into a Transformer architecture to efficiently train it on diverse medical datasets without massive computational costs.
arXiv Detail & Related papers (2024-10-23T00:31:17Z) - Patient-centered data science: an integrative framework for evaluating and predicting clinical outcomes in the digital health era [0.0]
This study proposes a novel, integrative framework for patient-centered data science in the digital health era.
We developed a multidimensional model that combines traditional clinical data with patient-reported outcomes, social determinants of health, and multi-omic data to create comprehensive digital patient representations.
arXiv Detail & Related papers (2024-07-31T02:36:17Z) - Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records [1.338174941551702]
This study assesses the capability of the Llama 2 LLM to create synthetic medical records that accurately reflect real patient information.
We focus on generating synthetic narratives for the History of Present Illness section, utilising data from the MIMIC-IV dataset for comparison.
Our findings suggest that this chain-of-thought prompted approach allows the zero-shot model to achieve results on par with those of fine-tuned models, based on Rouge metrics evaluation.
arXiv Detail & Related papers (2024-03-13T16:17:09Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Modelling Patient Trajectories Using Multimodal Information [0.0]
We propose a solution to model patient trajectories that combines different types of information and considers the temporal aspect of clinical data.
The developed solution was evaluated on two different clinical outcomes, unexpected patient readmission and disease progression.
arXiv Detail & Related papers (2022-09-09T10:20:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.