Design and Implementation of a Scalable Clinical Data Warehouse for Resource-Constrained Healthcare Systems
- URL: http://arxiv.org/abs/2502.16674v1
- Date: Sun, 23 Feb 2025 18:19:30 GMT
- Title: Design and Implementation of a Scalable Clinical Data Warehouse for Resource-Constrained Healthcare Systems
- Authors: Shovito Barua Soumma, Fahim Shahriar, Umme Niraj Mahi, Md Hasin Abrar, Md Abdur Rahman Fahad, Abu Sayed Md. Latiful Hoque,
- Abstract summary: This study proposes a scalable, privacy-limited clinical data warehouse, NCDW, designed for heterogeneous EHR integration in resource-limited settings.<n>The framework can be adapted to various healthcare settings across developing nations by modifying the ingestion layer to accommodate standards like ICD-11 and HL7 FHIR.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Centralized electronic health record repositories are critical for advancing disease surveillance, public health research, and evidence-based policymaking. However, developing countries face persistent challenges in achieving this due to fragmented healthcare data sources, inconsistent record-keeping practices, and the absence of standardized patient identifiers, limiting reliable record linkage, compromise data interoperability, and limit scalability-obstacles exacerbated by infrastructural constraints and privacy concerns. To address these barriers, this study proposes a scalable, privacy-preserving clinical data warehouse, NCDW, designed for heterogeneous EHR integration in resource-limited settings and tested with 1.16 million clinical records. The framework incorporates a wrapper-based data acquisition layer for secure, automated ingestion of multisource health data and introduces a soundex algorithm to resolve patient identity mismatches in the absence of unique IDs. A modular data mart is designed for disease-specific analytics, demonstrated through a dengue fever case study in Bangladesh, integrating clinical, demographic, and environmental data for outbreak prediction and resource planning. Quantitative assessment of the data mart underscores its utility in strengthening national decision-support systems, highlighting the model's adaptability for infectious disease management. Comparative evaluation of database technologies reveals NoSQL outperforms relational SQL by 40-69% in complex query processing, while system load estimates validate the architecture's capacity to manage 19 million daily records (34TB over 5 years). The framework can be adapted to various healthcare settings across developing nations by modifying the ingestion layer to accommodate standards like ICD-11 and HL7 FHIR, facilitating interoperability for managing infectious diseases (i.e., COVID, tuberculosis).
Related papers
- Leveraging Generative AI Through Prompt Engineering and Rigorous Validation to Create Comprehensive Synthetic Datasets for AI Training in Healthcare [0.0]
The GPT-4 API was employed to generate high-quality synthetic datasets aimed at overcoming this limitation.
The generated data encompassed a comprehensive array of patient admission information, including healthcare provider details, hospital departments, wards, bed assignments, patient demographics, emergency contacts, vital signs, immunizations, allergies, medical histories, appointments, hospital visits, laboratory tests, diagnoses, treatment plans, medications, clinical notes, visit logs, discharge summaries, and referrals.
arXiv Detail & Related papers (2025-04-29T16:37:34Z) - Cloud based DevOps Framework for Identifying Risk Factors of Hospital Utilization [0.0]
This study aims to investigate the integration of continuous integration and deployment (CI/CD) practices in data science.
An end-to-end cloud-based DevOps framework is proposed for data analysis which examines risk factors associated with hospital utilization.
The framework can be especially useful for sparse dataset domains such as environmental science, robotics, cybersecurity, and cultural heritage and arts.
arXiv Detail & Related papers (2025-04-18T22:45:12Z) - Quantum-Inspired Privacy-Preserving Federated Learning Framework for Secure Dementia Classification [0.0]
This paper introduces a novel framework that integrates federated learning with quantum-inspired encryption techniques for dementia classification.
The framework offers significant implications for democratizing access to AI-driven dementia diagnostics in low- and middle-income countries.
arXiv Detail & Related papers (2025-03-05T08:49:31Z) - Electronic Health Records: Towards Digital Twins in Healthcare [1.702436955667761]
This chapter explores the evolution and significance of healthcare information systems.<n>It begins with an examination of the implementation of EHR in the UK and the USA.<n>It provides a comprehensive overview of the International Classification of Diseases (ICD) system, tracing its development from ICD-9 to ICD-10.<n>Central to this discussion is the MIMIC-III database, a landmark achievement in healthcare data sharing and arguably the most comprehensive critical care database freely available to researchers worldwide.
arXiv Detail & Related papers (2025-01-16T16:30:02Z) - FedCVD: The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data [52.55123685248105]
Cardiovascular diseases (CVDs) are currently the leading cause of death worldwide, highlighting the critical need for early diagnosis and treatment.
Machine learning (ML) methods can help diagnose CVDs early, but their performance relies on access to substantial data with high quality.
This paper presents the first real-world FL benchmark for cardiovascular disease detection, named FedCVD.
arXiv Detail & Related papers (2024-10-28T02:24:01Z) - Prediction and Detection of Terminal Diseases Using Internet of Medical Things: A Review [4.4389631374821255]
AI-driven models have achieved over 98% accuracy in predicting heart disease, chronic kidney disease (CKD), Alzheimer's disease, and lung cancer.
The incorporation of IoMT data, which is vast and heterogeneous, adds complexities in ensuring interoperability and security to protect patient privacy.
Future research should focus on data standardization and advanced preprocessing techniques to improve data quality and interoperability.
arXiv Detail & Related papers (2024-09-22T15:02:33Z) - FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection [83.54960238236548]
FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models.
FEDMEKI allows medical foundation models to learn from a broader spectrum of medical knowledge without direct data exposure.
arXiv Detail & Related papers (2024-08-17T15:18:56Z) - Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs [3.8895618250348116]
Average financial impact of a data breach in recent months has been estimated to be close to USD 10 million.
Computer-based systems for de-identification of personal information are vulnerable to data drift.
arXiv Detail & Related papers (2024-07-08T12:47:03Z) - Revisiting Computer-Aided Tuberculosis Diagnosis [56.80999479735375]
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually.
Computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data.
We establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas.
This dataset enables the training of sophisticated detectors for high-quality CTD.
arXiv Detail & Related papers (2023-07-06T08:27:48Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - From predictions to prescriptions: A data-driven response to COVID-19 [42.57407485467993]
We propose a comprehensive data-driven approach to understand the clinical characteristics of COVID-19.
We build personalized calculators to predict the risk of infection and mortality.
We propose an optimization model to re-allocate ventilators and alleviate shortages.
arXiv Detail & Related papers (2020-06-30T03:34:00Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.