Why patient data cannot be easily forgotten?
- URL: http://arxiv.org/abs/2206.14541v1
- Date: Wed, 29 Jun 2022 11:36:49 GMT
- Title: Why patient data cannot be easily forgotten?
- Authors: Ruolin Su, Xiao Liu and Sotirios A. Tsaftaris
- Abstract summary: We study the influence of patient data on model performance and formulate two hypotheses for a patient's data.
We propose a targeted forgetting approach to perform patient-wise forgetting.
- Score: 18.089204090335667
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rights provisioned within data protection regulations, permit patients to
request that knowledge about their information be eliminated by data holders.
With the advent of AI learned on data, one can imagine that such rights can
extent to requests for forgetting knowledge of patient's data within AI models.
However, forgetting patients' imaging data from AI models, is still an
under-explored problem. In this paper, we study the influence of patient data
on model performance and formulate two hypotheses for a patient's data: either
they are common and similar to other patients or form edge cases, i.e. unique
and rare cases. We show that it is not possible to easily forget patient data.
We propose a targeted forgetting approach to perform patient-wise forgetting.
Extensive experiments on the benchmark Automated Cardiac Diagnosis Challenge
dataset showcase the improved performance of the proposed targeted forgetting
approach as opposed to a state-of-the-art method.
Related papers
- The doctor will polygraph you now: ethical concerns with AI for fact-checking patients [0.23248585800296404]
Artificial intelligence (AI) methods have been proposed for the prediction of social behaviors.
This raises novel ethical concerns about respect, privacy, and control over patient data.
arXiv Detail & Related papers (2024-08-15T02:55:30Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design.
We provide basic validation methods for each task to ensure the datasets' usability and reliability.
We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records [1.338174941551702]
This study assesses the capability of the Llama 2 LLM to create synthetic medical records that accurately reflect real patient information.
We focus on generating synthetic narratives for the History of Present Illness section, utilising data from the MIMIC-IV dataset for comparison.
Our findings suggest that this chain-of-thought prompted approach allows the zero-shot model to achieve results on par with those of fine-tuned models, based on Rouge metrics evaluation.
arXiv Detail & Related papers (2024-03-13T16:17:09Z) - Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data [2.1375651880073834]
generative AI models have been gaining traction for facilitating open-data sharing.
These models generate patient data copies instead of novel synthetic samples.
We train 2D and 3D latent diffusion models on CT, MR, and X-ray datasets for synthetic data generation.
arXiv Detail & Related papers (2024-02-01T22:58:21Z) - Predict and Interpret Health Risk using EHR through Typical Patients [14.457088774025731]
We propose a Progressive Prototypical Network (PPN) to select typical patients as prototypes and utilize their information to enhance the representation of the given patient.
Experiments on three real-world datasets demonstrate that our model brings improvement on all metrics.
arXiv Detail & Related papers (2023-12-18T07:00:20Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Leveraging Generative AI Models for Synthetic Data Generation in
Healthcare: Balancing Research and Privacy [0.0]
generative AI models like GANs and VAEs offer a promising solution to balance valuable data access and patient privacy protection.
In this paper, we examine generative AI models for creating realistic, anonymized patient data for research and training.
arXiv Detail & Related papers (2023-05-09T08:12:44Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - FLOP: Federated Learning on Medical Datasets using Partial Networks [84.54663831520853]
COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources.
Different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19.
The data itself is still scarce due to patient privacy concerns.
We propose a simple yet effective algorithm, named textbfFederated textbfL textbfon Medical datasets using textbfPartial Networks (FLOP)
arXiv Detail & Related papers (2021-02-10T01:56:58Z) - DeepEnroll: Patient-Trial Matching with Deep Embedding and Entailment
Prediction [67.91606509226132]
Clinical trials are essential for drug development but often suffer from expensive, inaccurate and insufficient patient recruitment.
DeepEnroll is a cross-modal inference learning model to jointly encode enrollment criteria (tabular data) into a shared latent space for matching inference.
arXiv Detail & Related papers (2020-01-22T17:51:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.