A Methodology for a Scalable, Collaborative, and Resource-Efficient
Platform to Facilitate Healthcare AI Research
- URL: http://arxiv.org/abs/2112.06883v1
- Date: Mon, 13 Dec 2021 18:39:10 GMT
- Title: A Methodology for a Scalable, Collaborative, and Resource-Efficient
Platform to Facilitate Healthcare AI Research
- Authors: Raphael Y. Cohen and Vesela P. Kovacheva
- Abstract summary: We present a system to accelerate data acquisition, dataset development and analysis, and AI model development.
This system can ingest 15,000 patient records per hour, where each record represents thousands of measurements, text notes, and high resolution data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Healthcare AI holds the potential to increase patient safety, augment
efficiency and improve patient outcomes, yet research is often limited by data
access, cohort curation, and tooling for analysis. Collection and translation
of electronic health record data, live data, and real-time high resolution
device data can be challenging and time-consuming. The development of
real-world AI tools requires overcoming challenges in data acquisition, scarce
hospital resources and high needs for data governance. These bottlenecks may
result in resource-heavy needs and long delays in research and development of
AI systems. We present a system and methodology to accelerate data acquisition,
dataset development and analysis, and AI model development. We created an
interactive platform that relies on a scalable microservice backend. This
system can ingest 15,000 patient records per hour, where each record represents
thousands of multimodal measurements, text notes, and high resolution data.
Collectively, these records can approach a terabyte of data. The system can
further perform cohort generation and preliminary dataset analysis in 2-5
minutes. As a result, multiple users can collaborate simultaneously to iterate
on datasets and models in real time. We anticipate that this approach will
drive real-world AI model development, and, in the long run, meaningfully
improve healthcare delivery.
Related papers
- Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Automated Multi-Task Learning for Joint Disease Prediction on Electronic Health Records [4.159498069487535]
We propose an automated approach named AutoDP, which can search for the optimal configuration of task grouping and architectures simultaneously.
It achieves significant performance improvements over both hand-crafted and automated state-of-the-art methods, also maintains a feasible search cost at the same time.
arXiv Detail & Related papers (2024-03-06T22:32:48Z) - README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP [9.432205523734707]
We introduce a new task of automatically generating lay definitions, aiming to simplify medical terms into patient-friendly lay language.
We first created the dataset, an extensive collection of over 50,000 unique (medical term, lay definition) pairs and 300,000 mentions.
We have also engineered a data-centric Human-AI pipeline that synergizes data filtering, augmentation, and selection to improve data quality.
arXiv Detail & Related papers (2023-12-24T23:01:00Z) - Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets [17.774341783844026]
This work proposes Multimodal Integration of Oncology Data System (MINDS)
MINDS is a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources.
By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability.
arXiv Detail & Related papers (2023-09-30T15:44:39Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Data-centric Artificial Intelligence: A Survey [47.24049907785989]
Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI.
In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals.
We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle.
arXiv Detail & Related papers (2023-03-17T17:44:56Z) - Deep Learning and Handheld Augmented Reality Based System for Optimal
Data Collection in Fault Diagnostics Domain [0.0]
This paper presents a novel human-machine interaction framework to perform fault diagnostics with minimal data.
Minimizing the required data will increase the practicability of data-driven models in diagnosing faults.
The proposed framework has provided above 100% precision and recall on a novel dataset with only one instance of each fault condition.
arXiv Detail & Related papers (2022-06-15T19:15:26Z) - Robust and Efficient Medical Imaging with Self-Supervision [80.62711706785834]
We present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI.
We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data.
arXiv Detail & Related papers (2022-05-19T17:34:18Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - A Deep Learning Pipeline for Patient Diagnosis Prediction Using
Electronic Health Records [0.5672132510411464]
We develop and publish a Python package to transform public health dataset into easy to access universal format.
We propose two novel model architectures to predict multiple diagnoses simultaneously.
Both models can predict multiple diagnoses simultaneously with high accuracy.
arXiv Detail & Related papers (2020-06-23T14:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.