Distantly supervised end-to-end medical entity extraction from
electronic health records with human-level quality
- URL: http://arxiv.org/abs/2201.10463v1
- Date: Tue, 25 Jan 2022 17:04:46 GMT
- Title: Distantly supervised end-to-end medical entity extraction from
electronic health records with human-level quality
- Authors: Alexander Nesterov and Dmitry Umerenkov
- Abstract summary: We propose a novel method of doing medical EE from electronic health records ( EHR) as a single-step multi-label classification task.
Our model is trained end-to-end in a distantly supervised manner using targets automatically extracted from medical knowledge base.
Our work demonstrates that medical entity extraction can be done end-to-end without human supervision and with human quality given the availability of a large enough amount of unlabeled EHR and a medical knowledge base.
- Score: 77.34726150561087
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Medical entity extraction (EE) is a standard procedure used as a first stage
in medical texts processing. Usually Medical EE is a two-step process: named
entity recognition (NER) and named entity normalization (NEN). We propose a
novel method of doing medical EE from electronic health records (EHR) as a
single-step multi-label classification task by fine-tuning a transformer model
pretrained on a large EHR dataset. Our model is trained end-to-end in an
distantly supervised manner using targets automatically extracted from medical
knowledge base. We show that our model learns to generalize for entities that
are present frequently enough, achieving human-level classification quality for
most frequent entities. Our work demonstrates that medical entity extraction
can be done end-to-end without human supervision and with human quality given
the availability of a large enough amount of unlabeled EHR and a medical
knowledge base.
Related papers
- STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical [58.79671189792399]
STLLaVA-Med is designed to train a policy model capable of auto-generating medical visual instruction data.
We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks.
arXiv Detail & Related papers (2024-06-28T15:01:23Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Next Visit Diagnosis Prediction via Medical Code-Centric Multimodal Contrastive EHR Modelling with Hierarchical Regularisation [0.0]
We propose NECHO, a novel medical code-centric multimodal contrastive EHR learning framework with hierarchical regularisation.
First, we integrate multifaceted information encompassing medical codes, demographics, and clinical notes using a tailored network design.
We also regularise modality-specific encoders using a parental level information in medical ontology to learn hierarchical structure of EHR data.
arXiv Detail & Related papers (2024-01-22T01:58:32Z) - INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and
Prognosis [19.32686665459374]
We introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE)
INSPECT contains data from 19,402 patients, including CT images, radiology report impression sections, and structured electronic health record (EHR) data (i.e. demographics, diagnoses, procedures, vitals, and medications)
arXiv Detail & Related papers (2023-11-17T07:28:16Z) - BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - Towards Medical Artificial General Intelligence via Knowledge-Enhanced
Multimodal Pretraining [121.89793208683625]
Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks.
We propose a new paradigm called Medical-knedge-enhanced mulTimOdal pretRaining (MOTOR)
arXiv Detail & Related papers (2023-04-26T01:26:19Z) - How to Leverage Multimodal EHR Data for Better Medical Predictions? [13.401754962583771]
The complexity of electronic health records ( EHR) data is a challenge for the application of deep learning.
In this paper, we first extract the accompanying clinical notes from EHR and propose a method to integrate these data.
The results on two medical prediction tasks show that our fused model with different data outperforms the state-of-the-art method.
arXiv Detail & Related papers (2021-10-29T13:26:05Z) - Recognising Biomedical Names: Challenges and Solutions [9.51284672475743]
We propose a transition-based NER model which can recognise discontinuous mentions.
We also develop a cost-effective approach that nominates the suitable pre-training data.
Our contributions have obvious practical implications, especially when new biomedical applications are needed.
arXiv Detail & Related papers (2021-06-23T08:20:13Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.