Related papers: Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets

Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets

URL: http://arxiv.org/abs/2310.07799v2
Date: Thu, 25 Jan 2024 18:00:05 GMT
Title: Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets
Authors: Zhongji Zhang, Yuhang Wang, Yinghao Zhu, Xinyu Ma, Tianlong Wang, Chaohe Zhang, Yasha Wang, Liantao Ma
Abstract summary: An effective prognostic model is expected to assist doctors in making right diagnosis and designing personalized treatment plan. In the early stage of a disease, limited data collection and clinical experiences, plus the concern out of privacy and ethics, may result in restricted data availability for reference. This article introduces a domain-invariant representation learning method to build a transition model from source dataset to target dataset.
Score: 16.317118701435742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the limited information about emerging diseases, symptoms are hard to be noticed and recognized, so that the window for clinical intervention could be ignored. An effective prognostic model is expected to assist doctors in making right diagnosis and designing personalized treatment plan, so to promptly prevent unfavorable outcomes. However, in the early stage of a disease, limited data collection and clinical experiences, plus the concern out of privacy and ethics, may result in restricted data availability for reference, to the extent that even data labels are difficult to mark correctly. In addition, Electronic Medical Record (EMR) data of different diseases or of different sources of the same disease can prove to be having serious cross-dataset feature misalignment problems, greatly mutilating the efficiency of deep learning models. This article introduces a domain-invariant representation learning method to build a transition model from source dataset to target dataset. By way of constraining the distribution shift of features generated in disparate domains, domain-invariant features that are exclusively relative to downstream tasks are captured, so to cultivate a unified domain-invariant encoder across various task domains to achieve better feature representation. Experimental results of several target tasks demonstrate that our proposed model outperforms competing baseline methods and has higher rate of training convergence, especially in dealing with limited data amount. A multitude of experiences have proven the efficacy of our method to provide more accurate predictions concerning newly emergent pandemics and other diseases.

Related papers

Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates. Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information. Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals. Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z)
Few-shot Metric Domain Adaptation: Practical Learning Strategies for an Automated Plant Disease Diagnosis [2.7992435001846827]
Few-shot Metric Domain Adaptation (FMDA) is a flexible and effective approach for enhancing diagnostic accuracy in practical systems. FMDA reduces domain discrepancies by introducing a constraint to the diagnostic model that minimizes the "distance" between feature spaces of source (training) data and target data with limited samples. In large-scale experiments, FMDA achieved F1 score improvements of 11.1 to 29.3 points compared to cases without target data, using only 10 images per disease from the target domain.
arXiv Detail & Related papers (2024-12-25T10:01:30Z)
LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks. We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD) LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z)
Cross-Task Data Augmentation by Pseudo-label Generation for Region Based Coronary Artery Instance Segmentation [6.611985866622974]
Coronary Artery Diseases (CADs) although preventable, are one of the leading causes of death and disability. Due to the limited amount of data and the difficulty in curating a dataset, the task of segmentation has proven challenging. We introduce the use of pseudo-labels to address the issue of limited data in the angiographic dataset to enhance the performance of the baseline YOLO model.
arXiv Detail & Related papers (2023-10-08T04:54:12Z)
Dual-Reference Source-Free Active Domain Adaptation for Nasopharyngeal Carcinoma Tumor Segmentation across Multiple Hospitals [9.845637899896365]
Nasopharyngeal carcinoma (NPC) is a prevalent and clinically significant malignancy that predominantly impacts the head and neck area. We propose a novel Sourece-Free Active Domain Adaptation (SFADA) framework to facilitate domain adaptation for the Gross Tumor Volume (GTV) segmentation task. We collect a large-scale clinical dataset comprising 1057 NPC patients from five hospitals to validate our approach.
arXiv Detail & Related papers (2023-09-23T15:26:27Z)
Amplifying Pathological Detection in EEG Signaling Pathways through Cross-Dataset Transfer Learning [10.212217551908525]
We study the effectiveness of data and model scaling and cross-dataset knowledge transfer in a real-world pathology classification task. We identify the challenges of possible negative transfer and emphasize the significance of some key components. Our findings indicate a small and generic model (e.g. ShallowNet) performs well on a single dataset, however, a larger model (e.g. TCN) performs better on transfer and learning from a larger and diverse dataset.
arXiv Detail & Related papers (2023-09-19T20:09:15Z)
ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment. The scarcity of annotated data limits the effectiveness and generalization of existing methods. We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z)
Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions. We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z)
Domain Generalization with Adversarial Intensity Attack for Medical Image Segmentation [27.49427483473792]
In real-world scenarios, it is common for models to encounter data from new and different domains to which they were not exposed to during training. domain generalization (DG) is a promising direction as it enables models to handle data from previously unseen domains. We introduce a novel DG method called Adversarial Intensity Attack (AdverIN), which leverages adversarial training to generate training data with an infinite number of styles.
arXiv Detail & Related papers (2023-04-05T19:40:51Z)
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem. Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools. We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
A Novel TSK Fuzzy System Incorporating Multi-view Collaborative Transfer Learning for Personalized Epileptic EEG Detection [20.11589208667256]
We propose a TSK fuzzy system-based epilepsy detection algorithm that integrates multi-view collaborative transfer learning. The proposed method has the potential to detect epileptic EEG signals effectively.
arXiv Detail & Related papers (2021-11-11T12:15:55Z)
Adversarial Sample Enhanced Domain Adaptation: A Case Study on Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation. adversarially generated samples are used during domain adaptation. Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
Cross-Domain Segmentation with Adversarial Loss and Covariate Shift for Biomedical Imaging [2.1204495827342438]
This manuscript aims to implement a novel model that can learn robust representations from cross-domain data by encapsulating distinct and shared patterns from different modalities. The tests on CT and MRI liver data acquired in routine clinical trials show that the proposed model outperforms all other baseline with a large margin.
arXiv Detail & Related papers (2020-06-08T07:35:55Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.