Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets
- URL: http://arxiv.org/abs/2310.01438v2
- Date: Fri, 22 Dec 2023 15:59:38 GMT
- Title: Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets
- Authors: Aakash Tripathi, Asim Waqas, Kavya Venkatesan, Yasin Yilmaz, Ghulam
Rasool
- Abstract summary: This work proposes Multimodal Integration of Oncology Data System (MINDS)
MINDS is a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources.
By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability.
- Score: 17.774341783844026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advancements in data acquisition, storage, and processing techniques have
resulted in the rapid growth of heterogeneous medical data. Integrating
radiological scans, histopathology images, and molecular information with
clinical data is essential for developing a holistic understanding of the
disease and optimizing treatment. The need for integrating data from multiple
sources is further pronounced in complex diseases such as cancer for enabling
precision medicine and personalized treatments. This work proposes Multimodal
Integration of Oncology Data System (MINDS) - a flexible, scalable, and
cost-effective metadata framework for efficiently fusing disparate data from
public sources such as the Cancer Research Data Commons (CRDC) into an
interconnected, patient-centric framework. MINDS offers an interface for
exploring relationships across data types and building cohorts for developing
large-scale multimodal machine learning models. By harmonizing multimodal data,
MINDS aims to potentially empower researchers with greater analytical ability
to uncover diagnostic and prognostic insights and enable evidence-based
personalized care. MINDS tracks granular end-to-end data provenance, ensuring
reproducibility and transparency. The cloud-native architecture of MINDS can
handle exponential data growth in a secure, cost-optimized manner while
ensuring substantial storage optimization, replication avoidance, and dynamic
access capabilities. Auto-scaling, access controls, and other mechanisms
guarantee pipelines' scalability and security. MINDS overcomes the limitations
of existing biomedical data silos via an interoperable metadata-driven approach
that represents a pivotal step toward the future of oncology data integration.
Related papers
- MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems [12.914295902429]
We introduce a real world multi-modal dataset called MMIST-CCRCC.
This dataset comprises 2 radiology modalities (CT and MRI), histopathology, genomics, and clinical data from 618 patients with clear cell renal cell carcinoma (ccRCC)
We show that even with such severe missing rates the fusion of modalities leads to improvements in the survival forecasting.
arXiv Detail & Related papers (2024-05-02T18:29:05Z) - Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced Modalities [9.476402318365446]
In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions.
We propose a solution by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL.
arXiv Detail & Related papers (2024-01-07T23:45:01Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - HEALNet -- Hybrid Multi-Modal Fusion for Heterogeneous Biomedical Data [12.109041184519281]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multi-modal fusion architecture.
We conduct multi-modal survival analysis on Whole Slide Images and Multi-omic data on four cancer cohorts of The Cancer Genome Atlas (TCGA)
HEALNet achieves state-of-the-art performance, substantially improving over both uni-modal and recent multi-modal baselines.
arXiv Detail & Related papers (2023-11-15T17:06:26Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review [0.0]
Integrating diverse data types can improve the accuracy and reliability of cancer diagnosis and treatment.
Deep neural networks have facilitated the development of sophisticated multimodal data fusion approaches.
Recent deep learning frameworks such as Graph Neural Networks (GNNs) and Transformers have shown remarkable success in multimodal learning.
arXiv Detail & Related papers (2023-03-11T17:52:03Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - MS-Net: Multi-Site Network for Improving Prostate Segmentation with
Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations.
Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.