Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets
- URL: http://arxiv.org/abs/2310.07799v3
- Date: Tue, 11 Feb 2025 12:32:59 GMT
- Title: Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets
- Authors: Zhongji Zhang, Yuhang Wang, Yinghao Zhu, Xinyu Ma, Yasha Wang, Junyi Gao, Liantao Ma, Wen Tang, Xiaoyun Zhang, Ling Wang,
- Abstract summary: An effective prognostic model could assist physicians in making accurate diagnoses and designing personalized treatment plans.
limited data collection, insufficient clinical experience, and privacy and ethical concerns restrict data availability.
We present a domain-invariant representation learning method that constructs a transition model between source and target datasets.
- Score: 28.59271580918754
- License:
- Abstract: Emerging diseases present challenges in symptom recognition and timely clinical intervention due to limited available information. An effective prognostic model could assist physicians in making accurate diagnoses and designing personalized treatment plans to prevent adverse outcomes. However, in the early stages of disease emergence, several factors hamper model development: limited data collection, insufficient clinical experience, and privacy and ethical concerns restrict data availability and complicate accurate label assignment. Furthermore, Electronic Medical Record (EMR) data from different diseases or sources often exhibit significant cross-dataset feature misalignment, severely impacting the effectiveness of deep learning models. We present a domain-invariant representation learning method that constructs a transition model between source and target datasets. By constraining the distribution shift of features generated across different domains, we capture domain-invariant features specifically relevant to downstream tasks, developing a unified domain-invariant encoder that achieves better feature representation across various task domains. Experimental results across multiple target tasks demonstrate that our proposed model surpasses competing baseline methods and achieves faster training convergence, particularly when working with limited data. Extensive experiments validate our method's effectiveness in providing more accurate predictions for emerging pandemics and other diseases. Code is publicly available at https://github.com/wang1yuhang/domain_invariant_network.
Related papers
- Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.
Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.
Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.
Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - Few-shot Metric Domain Adaptation: Practical Learning Strategies for an Automated Plant Disease Diagnosis [2.7992435001846827]
Few-shot Metric Domain Adaptation (FMDA) is a flexible and effective approach for enhancing diagnostic accuracy in practical systems.
FMDA reduces domain discrepancies by introducing a constraint to the diagnostic model that minimizes the "distance" between feature spaces of source (training) data and target data with limited samples.
In large-scale experiments, FMDA achieved F1 score improvements of 11.1 to 29.3 points compared to cases without target data, using only 10 images per disease from the target domain.
arXiv Detail & Related papers (2024-12-25T10:01:30Z) - LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks.
We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD)
LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z) - Amplifying Pathological Detection in EEG Signaling Pathways through
Cross-Dataset Transfer Learning [10.212217551908525]
We study the effectiveness of data and model scaling and cross-dataset knowledge transfer in a real-world pathology classification task.
We identify the challenges of possible negative transfer and emphasize the significance of some key components.
Our findings indicate a small and generic model (e.g. ShallowNet) performs well on a single dataset, however, a larger model (e.g. TCN) performs better on transfer and learning from a larger and diverse dataset.
arXiv Detail & Related papers (2023-09-19T20:09:15Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Domain Generalization with Adversarial Intensity Attack for Medical
Image Segmentation [27.49427483473792]
In real-world scenarios, it is common for models to encounter data from new and different domains to which they were not exposed to during training.
domain generalization (DG) is a promising direction as it enables models to handle data from previously unseen domains.
We introduce a novel DG method called Adversarial Intensity Attack (AdverIN), which leverages adversarial training to generate training data with an infinite number of styles.
arXiv Detail & Related papers (2023-04-05T19:40:51Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - A Novel TSK Fuzzy System Incorporating Multi-view Collaborative Transfer
Learning for Personalized Epileptic EEG Detection [20.11589208667256]
We propose a TSK fuzzy system-based epilepsy detection algorithm that integrates multi-view collaborative transfer learning.
The proposed method has the potential to detect epileptic EEG signals effectively.
arXiv Detail & Related papers (2021-11-11T12:15:55Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Cross-Domain Segmentation with Adversarial Loss and Covariate Shift for
Biomedical Imaging [2.1204495827342438]
This manuscript aims to implement a novel model that can learn robust representations from cross-domain data by encapsulating distinct and shared patterns from different modalities.
The tests on CT and MRI liver data acquired in routine clinical trials show that the proposed model outperforms all other baseline with a large margin.
arXiv Detail & Related papers (2020-06-08T07:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.