Ontology-driven weak supervision for clinical entity classification in
electronic health records
- URL: http://arxiv.org/abs/2008.01972v2
- Date: Tue, 6 Apr 2021 04:11:52 GMT
- Title: Ontology-driven weak supervision for clinical entity classification in
electronic health records
- Authors: Jason A. Fries, Ethan Steinberg, Saelig Khattar, Scott L. Fleming,
Jose Posada, Alison Callahan, Nigam H. Shah
- Abstract summary: We present Trove, a framework for weakly supervised entity classification using medical and expert-generated rules.
Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to manually labeled training data.
- Score: 6.815543071244677
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the electronic health record, using clinical notes to identify entities
such as disorders and their temporality (e.g. the order of an event relative to
a time index) can inform many important analyses. However, creating training
data for clinical entity tasks is time consuming and sharing labeled data is
challenging due to privacy concerns. The information needs of the COVID-19
pandemic highlight the need for agile methods of training machine learning
models for clinical notes. We present Trove, a framework for weakly supervised
entity classification using medical ontologies and expert-generated rules. Our
approach, unlike hand-labeled notes, is easy to share and modify, while
offering performance comparable to learning from manually labeled training
data. In this work, we validate our framework on six benchmark tasks and
demonstrate Trove's ability to analyze the records of patients visiting the
emergency department at Stanford Health Care for COVID-19 presenting symptoms
and risk factors.
Related papers
- Improving Clinical Note Generation from Complex Doctor-Patient Conversation [20.2157016701399]
We present three key contributions to the field of clinical note generation using large language models (LLMs)
First, we introduce CliniKnote, a dataset consisting of 1,200 complex doctor-patient conversations paired with their full clinical notes.
Second, we propose K-SOAP, which enhances traditional SOAPcitepodder20soap (Subjective, Objective, Assessment, and Plan) notes by adding a keyword section at the top, allowing for quick identification of essential information.
Third, we develop an automatic pipeline to generate K-SOAP notes from doctor-patient conversations and benchmark various modern LLMs using various
arXiv Detail & Related papers (2024-08-26T18:39:31Z) - Multimodal Pretraining of Medical Time Series and Notes [45.89025874396911]
Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data.
We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes.
In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
arXiv Detail & Related papers (2023-12-11T21:53:40Z) - Conceptualizing Machine Learning for Dynamic Information Retrieval of
Electronic Health Record Notes [6.1656026560972]
This work conceptualizes the use of EHR audit logs for machine learning as a source of supervision of note relevance in a specific clinical context.
We show that our methods can achieve an AUC of 0.963 for predicting which notes will be read in an individual note writing session.
arXiv Detail & Related papers (2023-08-09T21:04:19Z) - Classifying Unstructured Clinical Notes via Automatic Weak Supervision [17.45660355026785]
We introduce a general weakly-supervised text classification framework that learns from class-label descriptions only.
We leverage the linguistic domain knowledge stored within pre-trained language models and the data programming framework to assign code labels to texts.
arXiv Detail & Related papers (2022-06-24T05:55:49Z) - When Accuracy Meets Privacy: Two-Stage Federated Transfer Learning
Framework in Classification of Medical Images on Limited Data: A COVID-19
Case Study [77.34726150561087]
COVID-19 pandemic has spread rapidly and caused a shortage of global medical resources.
CNN has been widely utilized and verified in analyzing medical images.
arXiv Detail & Related papers (2022-03-24T02:09:41Z) - Federated Cycling (FedCy): Semi-supervised Federated Learning of
Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos.
We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z) - Self-Supervised Graph Learning with Hyperbolic Embedding for Temporal
Health Event Prediction [13.24834156675212]
We propose a hyperbolic embedding method with information flow to pre-train medical code representations in a hierarchical structure.
We incorporate these pre-trained representations into a graph neural network to detect disease complications.
We present a new hierarchy-enhanced historical prediction proxy task in our self-supervised learning framework to fully utilize EHR data.
arXiv Detail & Related papers (2021-06-09T00:42:44Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.