Domain-invariant Clinical Representation Learning by Bridging Data
Distribution Shift across EMR Datasets
- URL: http://arxiv.org/abs/2310.07799v2
- Date: Thu, 25 Jan 2024 18:00:05 GMT
- Title: Domain-invariant Clinical Representation Learning by Bridging Data
Distribution Shift across EMR Datasets
- Authors: Zhongji Zhang, Yuhang Wang, Yinghao Zhu, Xinyu Ma, Tianlong Wang,
Chaohe Zhang, Yasha Wang, Liantao Ma
- Abstract summary: An effective prognostic model is expected to assist doctors in making right diagnosis and designing personalized treatment plan.
In the early stage of a disease, limited data collection and clinical experiences, plus the concern out of privacy and ethics, may result in restricted data availability for reference.
This article introduces a domain-invariant representation learning method to build a transition model from source dataset to target dataset.
- Score: 16.317118701435742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the limited information about emerging diseases, symptoms are hard to
be noticed and recognized, so that the window for clinical intervention could
be ignored. An effective prognostic model is expected to assist doctors in
making right diagnosis and designing personalized treatment plan, so to
promptly prevent unfavorable outcomes. However, in the early stage of a
disease, limited data collection and clinical experiences, plus the concern out
of privacy and ethics, may result in restricted data availability for
reference, to the extent that even data labels are difficult to mark correctly.
In addition, Electronic Medical Record (EMR) data of different diseases or of
different sources of the same disease can prove to be having serious
cross-dataset feature misalignment problems, greatly mutilating the efficiency
of deep learning models. This article introduces a domain-invariant
representation learning method to build a transition model from source dataset
to target dataset. By way of constraining the distribution shift of features
generated in disparate domains, domain-invariant features that are exclusively
relative to downstream tasks are captured, so to cultivate a unified
domain-invariant encoder across various task domains to achieve better feature
representation. Experimental results of several target tasks demonstrate that
our proposed model outperforms competing baseline methods and has higher rate
of training convergence, especially in dealing with limited data amount. A
multitude of experiences have proven the efficacy of our method to provide more
accurate predictions concerning newly emergent pandemics and other diseases.
Related papers
- Cross-Task Data Augmentation by Pseudo-label Generation for Region Based Coronary Artery Instance Segmentation [6.611985866622974]
Coronary Artery Diseases (CADs) although preventable, are one of the leading causes of death and disability.
Due to the limited amount of data and the difficulty in curating a dataset, the task of segmentation has proven challenging.
We introduce the use of pseudo-labels to address the issue of limited data in the angiographic dataset to enhance the performance of the baseline YOLO model.
arXiv Detail & Related papers (2023-10-08T04:54:12Z) - Dual-Reference Source-Free Active Domain Adaptation for Nasopharyngeal
Carcinoma Tumor Segmentation across Multiple Hospitals [9.845637899896365]
Nasopharyngeal carcinoma (NPC) is a prevalent and clinically significant malignancy that predominantly impacts the head and neck area.
We propose a novel Sourece-Free Active Domain Adaptation (SFADA) framework to facilitate domain adaptation for the Gross Tumor Volume (GTV) segmentation task.
We collect a large-scale clinical dataset comprising 1057 NPC patients from five hospitals to validate our approach.
arXiv Detail & Related papers (2023-09-23T15:26:27Z) - Amplifying Pathological Detection in EEG Signaling Pathways through
Cross-Dataset Transfer Learning [10.212217551908525]
We study the effectiveness of data and model scaling and cross-dataset knowledge transfer in a real-world pathology classification task.
We identify the challenges of possible negative transfer and emphasize the significance of some key components.
Our findings indicate a small and generic model (e.g. ShallowNet) performs well on a single dataset, however, a larger model (e.g. TCN) performs better on transfer and learning from a larger and diverse dataset.
arXiv Detail & Related papers (2023-09-19T20:09:15Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Domain Generalization with Adversarial Intensity Attack for Medical
Image Segmentation [27.49427483473792]
In real-world scenarios, it is common for models to encounter data from new and different domains to which they were not exposed to during training.
domain generalization (DG) is a promising direction as it enables models to handle data from previously unseen domains.
We introduce a novel DG method called Adversarial Intensity Attack (AdverIN), which leverages adversarial training to generate training data with an infinite number of styles.
arXiv Detail & Related papers (2023-04-05T19:40:51Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - A Novel TSK Fuzzy System Incorporating Multi-view Collaborative Transfer
Learning for Personalized Epileptic EEG Detection [20.11589208667256]
We propose a TSK fuzzy system-based epilepsy detection algorithm that integrates multi-view collaborative transfer learning.
The proposed method has the potential to detect epileptic EEG signals effectively.
arXiv Detail & Related papers (2021-11-11T12:15:55Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.