Deep Contextual Clinical Prediction with Reverse Distillation
- URL: http://arxiv.org/abs/2007.05611v2
- Date: Thu, 17 Dec 2020 01:56:08 GMT
- Title: Deep Contextual Clinical Prediction with Reverse Distillation
- Authors: Rohan S. Kodialam, Rebecca Boiarsky, Justin Lim, Neil Dixit, Aditya
Sai, David Sontag
- Abstract summary: We present a new technique called Reverse Distillation which pretrains deep models by using high-performing linear models.
We make use of the longitudinal structure of insurance claims datasets to develop Self Attention with Reverse Distillation, or SARD.
SARD outperforms state-of-the-art methods on multiple clinical prediction outcomes.
- Score: 3.6700088931938835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Healthcare providers are increasingly using machine learning to predict
patient outcomes to make meaningful interventions. However, despite innovations
in this area, deep learning models often struggle to match performance of
shallow linear models in predicting these outcomes, making it difficult to
leverage such techniques in practice. In this work, motivated by the task of
clinical prediction from insurance claims, we present a new technique called
Reverse Distillation which pretrains deep models by using high-performing
linear models for initialization. We make use of the longitudinal structure of
insurance claims datasets to develop Self Attention with Reverse Distillation,
or SARD, an architecture that utilizes a combination of contextual embedding,
temporal embedding and self-attention mechanisms and most critically is trained
via reverse distillation. SARD outperforms state-of-the-art methods on multiple
clinical prediction outcomes, with ablation studies revealing that reverse
distillation is a primary driver of these improvements. Code is available at
https://github.com/clinicalml/omop-learn.
Related papers
- Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method
Perspective [65.70799289211868]
We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation.
We show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation.
arXiv Detail & Related papers (2023-11-28T09:53:05Z) - A Knowledge Distillation Approach for Sepsis Outcome Prediction from
Multivariate Clinical Time Series [2.621671379723151]
We use knowledge distillation via constrained variational inference to distill the knowledge of a powerful "teacher" neural network model.
We train a "student" latent variable model to learn interpretable hidden state representations to achieve high predictive performance for sepsis outcome prediction.
arXiv Detail & Related papers (2023-11-16T05:06:51Z) - Online Distillation for Pseudo-Relevance Feedback [16.523925354318983]
We investigate whether a model for a specific query can be effectively distilled from neural re-ranking results.
We find that a lexical model distilled online can reasonably replicate the re-ranking of a neural model.
More importantly, these models can be used as queries that execute efficiently on indexes.
arXiv Detail & Related papers (2023-06-16T07:26:33Z) - Prediction of Post-Operative Renal and Pulmonary Complications Using
Transformers [69.81176740997175]
We evaluate the performance of transformer-based models in predicting postoperative acute renal failure, pulmonary complications, and postoperative in-hospital mortality.
Our results demonstrate that transformer-based models can achieve superior performance in predicting postoperative complications and outperform traditional machine learning models.
arXiv Detail & Related papers (2023-06-01T14:08:05Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - Self Context and Shape Prior for Sensorless Freehand 3D Ultrasound
Reconstruction [61.62191904755521]
3D freehand US reconstruction is promising in addressing the problem by providing broad range and freeform scan.
Existing deep learning based methods only focus on the basic cases of skill sequences.
We propose a novel approach to sensorless freehand 3D US reconstruction considering the complex skill sequences.
arXiv Detail & Related papers (2021-07-31T16:06:50Z) - On the Robustness of Pretraining and Self-Supervision for a Deep
Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading.
We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions.
Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z) - The unreasonable effectiveness of Batch-Norm statistics in addressing
catastrophic forgetting across medical institutions [8.244654685687054]
We investigate trade-off between model refinement and retention of previously learned knowledge.
We propose a simple yet effective approach, adapting Elastic weight consolidation (EWC) using the global batch normalization statistics of the original dataset.
arXiv Detail & Related papers (2020-11-16T16:57:05Z) - Bidirectional Representation Learning from Transformers using Multimodal
Electronic Health Record Data to Predict Depression [11.1492931066686]
We present a temporal deep learning model to perform bidirectional representation learning on EHR sequences to predict depression.
The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model.
arXiv Detail & Related papers (2020-09-26T17:56:37Z) - Automatic Data Augmentation via Deep Reinforcement Learning for
Effective Kidney Tumor Segmentation [57.78765460295249]
We develop a novel automatic learning-based data augmentation method for medical image segmentation.
In our method, we innovatively combine the data augmentation module and the subsequent segmentation module in an end-to-end training manner with a consistent loss.
We extensively evaluated our method on CT kidney tumor segmentation which validated the promising results of our method.
arXiv Detail & Related papers (2020-02-22T14:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.