Related papers: Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets

Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets

URL: http://arxiv.org/abs/2310.01438v2
Date: Fri, 22 Dec 2023 15:59:38 GMT
Title: Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets
Authors: Aakash Tripathi, Asim Waqas, Kavya Venkatesan, Yasin Yilmaz, Ghulam Rasool
Abstract summary: This work proposes Multimodal Integration of Oncology Data System (MINDS) MINDS is a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability.
Score: 17.774341783844026
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS) - a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.

Related papers

From Classical Machine Learning to Emerging Foundation Models: Review on Multimodal Data Integration for Cancer Research [17.42746456656653]
Foundations models (FMs) offer new avenues for discovering biomarkers, improving diagnosis, and personalizing treatment.<n>We examine emerging trends in machine learning (ML) and deep learning (DL)<n>We identify the state-of-the-art FMs, publicly available multi-modal repositories, and advanced tools and methods for data integration.
arXiv Detail & Related papers (2025-07-11T21:23:21Z)
Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates. Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information. Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals. Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z)
Prediction and Detection of Terminal Diseases Using Internet of Medical Things: A Review [4.4389631374821255]
AI-driven models have achieved over 98% accuracy in predicting heart disease, chronic kidney disease (CKD), Alzheimer's disease, and lung cancer. The incorporation of IoMT data, which is vast and heterogeneous, adds complexities in ensuring interoperability and security to protect patient privacy. Future research should focus on data standardization and advanced preprocessing techniques to improve data quality and interoperability.
arXiv Detail & Related papers (2024-09-22T15:02:33Z)
Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training. The capacity to generalize effectively on smaller datasets remains a persistent challenge. We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z)
MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems [12.914295902429]
We introduce a real world multi-modal dataset called MMIST-CCRCC. This dataset comprises 2 radiology modalities (CT and MRI), histopathology, genomics, and clinical data from 618 patients with clear cell renal cell carcinoma (ccRCC) We show that even with such severe missing rates the fusion of modalities leads to improvements in the survival forecasting.
arXiv Detail & Related papers (2024-05-02T18:29:05Z)
XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data. We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions. Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z)
HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data [10.774128925670183]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multimodal fusion architecture. We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA) HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models.
arXiv Detail & Related papers (2023-11-15T17:06:26Z)
Source-Free Collaborative Domain Adaptation via Multi-Perspective Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis. Many methods have been proposed to reduce fMRI heterogeneity between source and target domains. But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies. We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z)
Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks. We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z)
Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review [0.0]
Integrating diverse data types can improve the accuracy and reliability of cancer diagnosis and treatment. Deep neural networks have facilitated the development of sophisticated multimodal data fusion approaches. Recent deep learning frameworks such as Graph Neural Networks (GNNs) and Transformers have shown remarkable success in multimodal learning.
arXiv Detail & Related papers (2023-03-11T17:52:03Z)
Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation. GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z)
Trajectories, bifurcations and pseudotime in large clinical datasets: applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values. The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations. Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.