Machine Learning with Electronic Health Records is vulnerable to
Backdoor Trigger Attacks
- URL: http://arxiv.org/abs/2106.07925v1
- Date: Tue, 15 Jun 2021 07:27:39 GMT
- Title: Machine Learning with Electronic Health Records is vulnerable to
Backdoor Trigger Attacks
- Authors: Byunggill Joe, Akshay Mehra, Insik Shin, and Jihun Hamm
- Abstract summary: We demonstrate that an attacker can manipulate the machine learning predictions with EHRs easily and selectively at test time.
We achieve average attack success rates of 97% on mortality prediction tasks with MIMIC-III database against Logistic Regression, Multilayer Perceptron, and Long Short-term Memory models.
- Score: 11.729565632882721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic Health Records (EHRs) provide a wealth of information for machine
learning algorithms to predict the patient outcome from the data including
diagnostic information, vital signals, lab tests, drug administration, and
demographic information. Machine learning models can be built, for example, to
evaluate patients based on their predicted mortality or morbidity and to
predict required resources for efficient resource management in hospitals. In
this paper, we demonstrate that an attacker can manipulate the machine learning
predictions with EHRs easily and selectively at test time by backdoor attacks
with the poisoned training data. Furthermore, the poison we create has
statistically similar features to the original data making it hard to detect,
and can also attack multiple machine learning models without any knowledge of
the models. With less than 5% of the raw EHR data poisoned, we achieve average
attack success rates of 97% on mortality prediction tasks with MIMIC-III
database against Logistic Regression, Multilayer Perceptron, and Long
Short-term Memory models simultaneously.
Related papers
- Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - Automatic prediction of mortality in patients with mental illness using
electronic health records [0.5957022371135096]
This paper addresses the persistent challenge of predicting mortality in patients with mental diagnoses.
Data from patients with mental disease diagnoses were extracted from the well-known clinical MIMIC-III data set.
Four machine learning algorithms were used, with results indicating that Random Forest and Support Vector Machine models outperformed others.
arXiv Detail & Related papers (2023-10-18T17:21:01Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Unsupervised Pre-Training on Patient Population Graphs for Patient-Level
Predictions [48.02011627390706]
Pre-training has shown success in different areas of machine learning, such as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging.
In this paper, we apply unsupervised pre-training to heterogeneous, multi-modal EHR data for patient outcome prediction.
We find that our proposed graph based pre-training method helps in modeling the data at a population level.
arXiv Detail & Related papers (2022-03-23T17:59:45Z) - Survival Prediction of Heart Failure Patients using Stacked Ensemble
Machine Learning Algorithm [0.0]
Heart failure is one of the major health hazard issues of our time and is a leading cause of death worldwide.
Data mining is the process of converting massive volumes of raw data created by the healthcare institutions into meaningful information.
Our study shows that only certain attributes collected from the patients are imperative to successfully predict the surviving possibility post heart failure.
arXiv Detail & Related papers (2021-08-30T16:42:27Z) - Self-Supervised Graph Learning with Hyperbolic Embedding for Temporal
Health Event Prediction [13.24834156675212]
We propose a hyperbolic embedding method with information flow to pre-train medical code representations in a hierarchical structure.
We incorporate these pre-trained representations into a graph neural network to detect disease complications.
We present a new hierarchy-enhanced historical prediction proxy task in our self-supervised learning framework to fully utilize EHR data.
arXiv Detail & Related papers (2021-06-09T00:42:44Z) - Attack-agnostic Adversarial Detection on Medical Data Using Explainable
Machine Learning [0.0]
We propose a model agnostic explainability-based method for the accurate detection of adversarial samples on two datasets.
On the MIMIC-III and Henan-Renmin EHR datasets, we report a detection accuracy of 77% against the Longitudinal Adrial Attack.
On the MIMIC-CXR dataset, we achieve an accuracy of 88%; significantly improving on the state of the art of adversarial detection in both datasets by over 10% in all settings.
arXiv Detail & Related papers (2021-05-05T10:01:53Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.