Platform for generating medical datasets for machine learning in public
health
- URL: http://arxiv.org/abs/2310.08532v1
- Date: Thu, 12 Oct 2023 17:23:52 GMT
- Title: Platform for generating medical datasets for machine learning in public
health
- Authors: Anna Andreychenko, Viktoriia Korzhuk, Stanislav Kondratenko, Polina
Cheraneva
- Abstract summary: This paper demonstrates a concept of the platform for a sustainable generation of quality and reliable sets of multimodal medical data.
It collects data from different external sources, harmonizes it using a special service, anonymizes harmonized data, and labels processed data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Currently, there are many difficulties regarding the interoperability of
medical data and related population data sources. These complications get in
the way of the generation of high-quality data sets at city, region and
national levels. Moreover, the collection of datasets within large medical
centers is feasible due to own IT departments whereas the collection of raw
medical data from multiple organizations is a more complicated process. In
these circumstances, the most appropriate option is to develop digital products
based on microservice architecture. Because of this approach, it is possible to
ensure the multimodality of the system, the flexibility of the interface and
the internal system approach, when interconnected elements behave as a whole,
demonstrating behavior different from the behavior when working independently.
These conditions allow, in turn, to ensure the maximum number and
representativeness of the resulting data sets. This paper demonstrates a
concept of the platform for a sustainable generation of quality and reliable
sets of multimodal medical data. It collects data from different external
sources, harmonizes it using a special service, anonymizes harmonized data, and
labels processed data. The proposed system aims to be a promising solution to
the improvement of medical data quality for machine learning.
Related papers
- Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets [17.774341783844026]
This work proposes Multimodal Integration of Oncology Data System (MINDS)
MINDS is a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources.
By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability.
arXiv Detail & Related papers (2023-09-30T15:44:39Z) - UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human
Generation [59.77275587857252]
A holistic human dataset inevitably has insufficient and low-resolution information on local parts.
We propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model.
arXiv Detail & Related papers (2023-09-25T17:58:46Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Patchwork Learning: A Paradigm Towards Integrative Analysis across
Diverse Biomedical Data Sources [40.32772510980854]
"patchwork learning" (PL) is a paradigm that integrates information from disparate datasets composed of different data modalities.
PL allows the simultaneous utilization of complementary data sources while preserving data privacy.
We present the concept of patchwork learning and its current implementations in healthcare, exploring the potential opportunities and applicable data sources.
arXiv Detail & Related papers (2023-05-10T14:50:33Z) - Integrated multimodal artificial intelligence framework for healthcare
applications [3.6222901399459215]
We propose and evaluate a unified Holistic AI in Medicine framework to facilitate the generation and testing of AI systems that leverage multimodal inputs.
Our approach uses generalizable data pre-processing and machine learning modeling stages that can be readily adapted for research and deployment in healthcare environments.
We show that this framework can consistently and robustly produce models that outperform similar single-source approaches across various healthcare demonstrations.
arXiv Detail & Related papers (2022-02-25T22:08:09Z) - Practical Challenges in Differentially-Private Federated Survival
Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models.
In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge.
We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z) - A Methodology for a Scalable, Collaborative, and Resource-Efficient
Platform to Facilitate Healthcare AI Research [0.0]
We present a system to accelerate data acquisition, dataset development and analysis, and AI model development.
This system can ingest 15,000 patient records per hour, where each record represents thousands of measurements, text notes, and high resolution data.
arXiv Detail & Related papers (2021-12-13T18:39:10Z) - The Medkit-Learn(ing) Environment: Medical Decision Modelling through
Simulation [81.72197368690031]
We present a new benchmarking suite designed specifically for medical sequential decision making.
The Medkit-Learn(ing) Environment is a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data.
arXiv Detail & Related papers (2021-06-08T10:38:09Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z) - A Deep Learning Pipeline for Patient Diagnosis Prediction Using
Electronic Health Records [0.5672132510411464]
We develop and publish a Python package to transform public health dataset into easy to access universal format.
We propose two novel model architectures to predict multiple diagnoses simultaneously.
Both models can predict multiple diagnoses simultaneously with high accuracy.
arXiv Detail & Related papers (2020-06-23T14:58:58Z) - MS-Net: Multi-Site Network for Improving Prostate Segmentation with
Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations.
Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.