Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets
- URL: http://arxiv.org/abs/2310.01438v2
- Date: Fri, 22 Dec 2023 15:59:38 GMT
- Title: Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets
- Authors: Aakash Tripathi, Asim Waqas, Kavya Venkatesan, Yasin Yilmaz, Ghulam
Rasool
- Abstract summary: This work proposes Multimodal Integration of Oncology Data System (MINDS)
MINDS is a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources.
By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability.
- Score: 17.774341783844026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advancements in data acquisition, storage, and processing techniques have
resulted in the rapid growth of heterogeneous medical data. Integrating
radiological scans, histopathology images, and molecular information with
clinical data is essential for developing a holistic understanding of the
disease and optimizing treatment. The need for integrating data from multiple
sources is further pronounced in complex diseases such as cancer for enabling
precision medicine and personalized treatments. This work proposes Multimodal
Integration of Oncology Data System (MINDS) - a flexible, scalable, and
cost-effective metadata framework for efficiently fusing disparate data from
public sources such as the Cancer Research Data Commons (CRDC) into an
interconnected, patient-centric framework. MINDS offers an interface for
exploring relationships across data types and building cohorts for developing
large-scale multimodal machine learning models. By harmonizing multimodal data,
MINDS aims to potentially empower researchers with greater analytical ability
to uncover diagnostic and prognostic insights and enable evidence-based
personalized care. MINDS tracks granular end-to-end data provenance, ensuring
reproducibility and transparency. The cloud-native architecture of MINDS can
handle exponential data growth in a secure, cost-optimized manner while
ensuring substantial storage optimization, replication avoidance, and dynamic
access capabilities. Auto-scaling, access controls, and other mechanisms
guarantee pipelines' scalability and security. MINDS overcomes the limitations
of existing biomedical data silos via an interoperable metadata-driven approach
that represents a pivotal step toward the future of oncology data integration.
Related papers
- Prediction and Detection of Terminal Diseases Using Internet of Medical Things: A Review [4.4389631374821255]
AI-driven models have achieved over 98% accuracy in predicting heart disease, chronic kidney disease (CKD), Alzheimer's disease, and lung cancer.
The incorporation of IoMT data, which is vast and heterogeneous, adds complexities in ensuring interoperability and security to protect patient privacy.
Future research should focus on data standardization and advanced preprocessing techniques to improve data quality and interoperability.
arXiv Detail & Related papers (2024-09-22T15:02:33Z) - Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training.
The capacity to generalize effectively on smaller datasets remains a persistent challenge.
We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z) - MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems [12.914295902429]
We introduce a real world multi-modal dataset called MMIST-CCRCC.
This dataset comprises 2 radiology modalities (CT and MRI), histopathology, genomics, and clinical data from 618 patients with clear cell renal cell carcinoma (ccRCC)
We show that even with such severe missing rates the fusion of modalities leads to improvements in the survival forecasting.
arXiv Detail & Related papers (2024-05-02T18:29:05Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data [10.774128925670183]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multimodal fusion architecture.
We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA)
HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models.
arXiv Detail & Related papers (2023-11-15T17:06:26Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review [0.0]
Integrating diverse data types can improve the accuracy and reliability of cancer diagnosis and treatment.
Deep neural networks have facilitated the development of sophisticated multimodal data fusion approaches.
Recent deep learning frameworks such as Graph Neural Networks (GNNs) and Transformers have shown remarkable success in multimodal learning.
arXiv Detail & Related papers (2023-03-11T17:52:03Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - MS-Net: Multi-Site Network for Improving Prostate Segmentation with
Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations.
Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.