Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology
- URL: http://arxiv.org/abs/2408.09278v2
- Date: Fri, 21 Mar 2025 04:57:26 GMT
- Title: Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology
- Authors: Junchao Zhu, Mengmeng Yin, Ruining Deng, Yitian Long, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo,
- Abstract summary: Training high-quality deep-learning models for layer segmentation relies on the availability of large amounts of annotated data.<n>Due to the patient's privacy of medical data and scarce clinical cases, pathological datasets from clinical sources is relatively difficult and expensive.<n>Cross-species data, such as mouse kidney data, which exhibits high structural and feature similarity to human kidneys, has the potential to enhance model performance.
- Score: 5.9813635886408045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate delineation of the boundaries between the renal cortex and medulla is crucial for subsequent functional structural analysis and disease diagnosis. Training high-quality deep-learning models for layer segmentation relies on the availability of large amounts of annotated data. However, due to the patient's privacy of medical data and scarce clinical cases, constructing pathological datasets from clinical sources is relatively difficult and expensive. Moreover, using external natural image datasets introduces noise during the domain generalization process. Cross-species homologous data, such as mouse kidney data, which exhibits high structural and feature similarity to human kidneys, has the potential to enhance model performance on human datasets. In this study, we incorporated the collected private Periodic Acid-Schiff (PAS) stained mouse kidney dataset into the human kidney dataset for joint training. The results showed that after introducing cross-species homologous data, the semantic segmentation models based on CNN and Transformer architectures achieved an average increase of 1.77% and 1.24% in mIoU, and 1.76% and 0.89% in Dice score for the human renal cortex and medulla datasets, respectively. This approach is also capable of enhancing the model's generalization ability. This indicates that cross-species homologous data, as a low-noise trainable data source, can help improve model performance under conditions of limited clinical samples. Code is available at https://github.com/hrlblab/layer_segmentation.
Related papers
- Robust Kidney Abnormality Segmentation: A Validation Study of an AI-Based Framework [3.225563371295004]
Kidney volume could serve as an important biomarker for renal diseases.<n>Currently, clinical practice often relies on subjective visual assessment for evaluating kidney size and abnormalities.<n>This research aims to develop a robust, thoroughly validated kidney abnormality segmentation algorithm.
arXiv Detail & Related papers (2025-05-12T13:53:19Z) - Prototype-Guided Diffusion for Digital Pathology: Achieving Foundation Model Performance with Minimal Clinical Data [6.318463500874778]
We propose a prototype-guided diffusion model to generate high-fidelity synthetic pathology data at scale.
Our approach ensures biologically and diagnostically meaningful variations in the generated data.
We demonstrate that self-supervised features trained on our synthetic dataset achieve competitive performance despite using 60x-760x less data than models trained on large real-world datasets.
arXiv Detail & Related papers (2025-04-15T21:17:39Z) - Synthetic ECG Generation for Data Augmentation and Transfer Learning in Arrhythmia Classification [1.7614607439356635]
We explore the usefulness of synthetic data generated with different generative models from Deep Learning.
We investigate the effects of transfer learning, by fine-tuning a synthetically pre-trained model and then adding increasing proportions of real data.
arXiv Detail & Related papers (2024-11-27T15:46:34Z) - Inpainting Pathology in Lumbar Spine MRI with Latent Diffusion [4.410798232767917]
We propose an efficient method for inpainting pathological features onto healthy anatomy in MRI.
We evaluate the method's ability to insert disc herniation and central canal stenosis in lumbar spine sagittal T2 MRI.
arXiv Detail & Related papers (2024-06-04T16:47:47Z) - Synthetic Data for Robust Stroke Segmentation [0.0]
Current deep learning-based approaches to lesion segmentation in neuroimaging often depend on high-resolution images and extensive annotated data.
This paper introduces a novel synthetic data framework tailored for stroke lesion segmentation.
Our approach trains models with label maps from healthy and stroke datasets, facilitating segmentation across both normal and pathological tissue.
arXiv Detail & Related papers (2024-04-02T13:42:29Z) - Few-shot learning for COVID-19 Chest X-Ray Classification with
Imbalanced Data: An Inter vs. Intra Domain Study [49.5374512525016]
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research.
Some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images.
We propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance.
arXiv Detail & Related papers (2024-01-18T16:59:27Z) - Cross-Task Data Augmentation by Pseudo-label Generation for Region Based Coronary Artery Instance Segmentation [6.611985866622974]
Coronary Artery Diseases (CADs) although preventable, are one of the leading causes of death and disability.
Due to the limited amount of data and the difficulty in curating a dataset, the task of segmentation has proven challenging.
We introduce the use of pseudo-labels to address the issue of limited data in the angiographic dataset to enhance the performance of the baseline YOLO model.
arXiv Detail & Related papers (2023-10-08T04:54:12Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Deep learning-based COVID-19 pneumonia classification using chest CT
images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries.
We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split.
The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.