Patchwork Learning: A Paradigm Towards Integrative Analysis across
Diverse Biomedical Data Sources
- URL: http://arxiv.org/abs/2305.06217v2
- Date: Sat, 13 May 2023 12:33:05 GMT
- Title: Patchwork Learning: A Paradigm Towards Integrative Analysis across
Diverse Biomedical Data Sources
- Authors: Suraj Rajendran, Weishen Pan, Mert R. Sabuncu, Yong Chen, Jiayu Zhou,
Fei Wang
- Abstract summary: "patchwork learning" (PL) is a paradigm that integrates information from disparate datasets composed of different data modalities.
PL allows the simultaneous utilization of complementary data sources while preserving data privacy.
We present the concept of patchwork learning and its current implementations in healthcare, exploring the potential opportunities and applicable data sources.
- Score: 40.32772510980854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) in healthcare presents numerous opportunities for
enhancing patient care, population health, and healthcare providers' workflows.
However, the real-world clinical and cost benefits remain limited due to
challenges in data privacy, heterogeneous data sources, and the inability to
fully leverage multiple data modalities. In this perspective paper, we
introduce "patchwork learning" (PL), a novel paradigm that addresses these
limitations by integrating information from disparate datasets composed of
different data modalities (e.g., clinical free-text, medical images, omics) and
distributed across separate and secure sites. PL allows the simultaneous
utilization of complementary data sources while preserving data privacy,
enabling the development of more holistic and generalizable ML models. We
present the concept of patchwork learning and its current implementations in
healthcare, exploring the potential opportunities and applicable data sources
for addressing various healthcare challenges. PL leverages bridging modalities
or overlapping feature spaces across sites to facilitate information sharing
and impute missing data, thereby addressing related prediction tasks. We
discuss the challenges associated with PL, many of which are shared by
federated and multimodal learning, and provide recommendations for future
research in this field. By offering a more comprehensive approach to healthcare
data integration, patchwork learning has the potential to revolutionize the
clinical applicability of ML models. This paradigm promises to strike a balance
between personalization and generalizability, ultimately enhancing patient
experiences, improving population health, and optimizing healthcare providers'
workflows.
Related papers
- iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine [28.917691563659467]
The iASiS infrastructure is able to convert clinical notes into usable data.
Using semantic integration of data gives the opportunity to generate information rich, auditable and reliable.
Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer.
arXiv Detail & Related papers (2024-07-09T10:52:19Z) - Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data [31.106733834322394]
We propose Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH)
We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets.
arXiv Detail & Related papers (2024-01-31T22:06:10Z) - Building Flexible, Scalable, and Machine Learning-ready Multimodal
Oncology Datasets [17.774341783844026]
This work proposes Multimodal Integration of Oncology Data System (MINDS)
MINDS is a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources.
By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability.
arXiv Detail & Related papers (2023-09-30T15:44:39Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - Multimodal Learning for Multi-Omics: A Survey [4.15790071124993]
Multimodal learning for integrative multi-omics analysis can help researchers and practitioners gain deep insights into human diseases.
However, several challenges are hindering the development in this area, including the availability of easily accessible open-source tools.
This survey aims to provide an up-to-date overview of the data challenges, fusion approaches, datasets, and software tools from several new perspectives.
arXiv Detail & Related papers (2022-11-29T12:08:06Z) - Decentralized Distributed Learning with Privacy-Preserving Data
Synthesis [9.276097219140073]
In the medical field, multi-center collaborations are often sought to yield more generalizable findings by leveraging the heterogeneity of patient and clinical data.
Recent privacy regulations hinder the possibility to share data, and consequently, to come up with machine learning-based solutions that support diagnosis and prognosis.
We present a decentralized distributed method that integrates features from local nodes, providing models able to generalize across multiple datasets while maintaining privacy.
arXiv Detail & Related papers (2022-06-20T23:49:38Z) - FLOP: Federated Learning on Medical Datasets using Partial Networks [84.54663831520853]
COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources.
Different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19.
The data itself is still scarce due to patient privacy concerns.
We propose a simple yet effective algorithm, named textbfFederated textbfL textbfon Medical datasets using textbfPartial Networks (FLOP)
arXiv Detail & Related papers (2021-02-10T01:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.