Towards System Modelling to Support Diseases Data Extraction from the Electronic Health Records for Physicians Research Activities
- URL: http://arxiv.org/abs/2404.01218v1
- Date: Mon, 1 Apr 2024 16:18:40 GMT
- Title: Towards System Modelling to Support Diseases Data Extraction from the Electronic Health Records for Physicians Research Activities
- Authors: Bushra F. Alsaqer, Alaa F. Alsaqer, Amna Asif,
- Abstract summary: This paper aims to make such data usable for research activities such as monitoring disease statistics for a specific population.
One of the limitations of EHRs systems is that the data is not available in the standard format but in various forms.
It is required to first convert the names of the diseases and demographics data into one standardized form to make it usable for research activities.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of Electronic Health Records (EHRs) has increased dramatically in the past 15 years, as, it is considered an important source of managing data od patients. The EHRs are primary sources of disease diagnosis and demographic data of patients worldwide. Therefore, the data can be utilized for secondary tasks such as research. This paper aims to make such data usable for research activities such as monitoring disease statistics for a specific population. As a result, the researchers can detect the disease causes for the behavior and lifestyle of the target group. One of the limitations of EHRs systems is that the data is not available in the standard format but in various forms. Therefore, it is required to first convert the names of the diseases and demographics data into one standardized form to make it usable for research activities. There is a large amount of EHRs available, and solving the standardizing issues requires some optimized techniques. We used a first-hand EHR dataset extracted from EHR systems. Our application uploads the dataset from the EHRs and converts it to the ICD-10 coding system to solve the standardization problem. So, we first apply the steps of pre-processing, annotation, and transforming the data to convert it into the standard form. The data pre-processing is applied to normalize demographic formats. In the annotation step, a machine learning model is used to recognize the diseases from the text. Furthermore, the transforming step converts the disease name to the ICD-10 coding format. The model was evaluated manually by comparing its performance in terms of disease recognition with an available dictionary-based system (MetaMap). The accuracy of the proposed machine learning model is 81%, that outperformed MetaMap accuracy of 67%. This paper contributed to system modelling for EHR data extraction to support research activities.
Related papers
- IGNITE: Individualized GeNeration of Imputations in Time-series
Electronic health records [7.451873794596469]
We propose a novel deep-learning model that learns the underlying patient dynamics to generate personalized values conditioning on an individual's demographic characteristics and treatments.
Our proposed model, IGNITE, utilise a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual.
We show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.
arXiv Detail & Related papers (2024-01-09T07:57:21Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Unsupervised Pre-Training on Patient Population Graphs for Patient-Level
Predictions [48.02011627390706]
Pre-training has shown success in different areas of machine learning, such as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging.
In this paper, we apply unsupervised pre-training to heterogeneous, multi-modal EHR data for patient outcome prediction.
We find that our proposed graph based pre-training method helps in modeling the data at a population level.
arXiv Detail & Related papers (2022-03-23T17:59:45Z) - Pre-training transformer-based framework on large-scale pediatric claims
data for downstream population-specific tasks [3.1580072841682734]
This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset.
The effective knowledge transfer is completed through the task-aware fine-tuning stage.
We conducted experiments on a real-world claims dataset with more than one million patient records.
arXiv Detail & Related papers (2021-06-24T15:25:41Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - A Deep Learning Pipeline for Patient Diagnosis Prediction Using
Electronic Health Records [0.5672132510411464]
We develop and publish a Python package to transform public health dataset into easy to access universal format.
We propose two novel model architectures to predict multiple diagnoses simultaneously.
Both models can predict multiple diagnoses simultaneously with high accuracy.
arXiv Detail & Related papers (2020-06-23T14:58:58Z) - Generation of Differentially Private Heterogeneous Electronic Health
Records [9.926231893220061]
We explore using Generative Adversarial Networks to generate synthetic, heterogeneous EHRs.
We will explore applying differential privacy (DP) preserving optimization in order to produce DP synthetic EHR data sets.
arXiv Detail & Related papers (2020-06-05T13:21:46Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.