Data Efficiency and Transfer Robustness in Biomedical Image Segmentation: A Study of Redundancy and Forgetting with Cellpose
- URL: http://arxiv.org/abs/2511.04803v1
- Date: Thu, 06 Nov 2025 20:45:56 GMT
- Title: Data Efficiency and Transfer Robustness in Biomedical Image Segmentation: A Study of Redundancy and Forgetting with Cellpose
- Authors: Shuo Zhao, Jianxu Chen,
- Abstract summary: Generalist biomedical image segmentation models such as Cellpose are increasingly applied across diverse imaging modalities and cell types.<n>In this study, we conduct a systematic empirical analysis of challenges using Cellpose as a case study.<n>To assess data redundancy, we propose a simple dataset quantization (DQ) strategy for constructing compact yet diverse training subsets.<n>To examine catastrophic forgetting, we perform cross domain finetuning experiments and observe significant degradation in source domain performance.
- Score: 2.37771211295026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalist biomedical image segmentation models such as Cellpose are increasingly applied across diverse imaging modalities and cell types. However, two critical challenges remain underexplored: (1) the extent of training data redundancy and (2) the impact of cross domain transfer on model retention. In this study, we conduct a systematic empirical analysis of these challenges using Cellpose as a case study. First, to assess data redundancy, we propose a simple dataset quantization (DQ) strategy for constructing compact yet diverse training subsets. Experiments on the Cyto dataset show that image segmentation performance saturates with only 10% of the data, revealing substantial redundancy and potential for training with minimal annotations. Latent space analysis using MAE embeddings and t-SNE confirms that DQ selected patches capture greater feature diversity than random sampling. Second, to examine catastrophic forgetting, we perform cross domain finetuning experiments and observe significant degradation in source domain performance, particularly when adapting from generalist to specialist domains. We demonstrate that selective DQ based replay reintroducing just 5-10% of the source data effectively restores source performance, while full replay can hinder target adaptation. Additionally, we find that training domain sequencing improves generalization and reduces forgetting in multi stage transfer. Our findings highlight the importance of data centric design in biomedical image segmentation and suggest that efficient training requires not only compact subsets but also retention aware learning strategies and informed domain ordering. The code is available at https://github.com/MMV-Lab/biomedseg-efficiency.
Related papers
- Bridging the Applicator Gap with Data-Doping:Dual-Domain Learning for Precise Bladder Segmentation in CT-Guided Brachytherapy [0.3058685580689604]
We investigate whether samples from a shifted distribution can effectively support learning when combined with limited target domain data.<n>While CT scans without brachytherapy applicators (no applicator: NA) are widely available, scans with applicators inserted (with applicator: WA) are scarce.<n>We propose a dual domain learning strategy that integrates NA and WA CT data to improve robustness and generalizability.
arXiv Detail & Related papers (2026-01-28T06:50:37Z) - Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z) - Re-Visible Dual-Domain Self-Supervised Deep Unfolding Network for MRI Reconstruction [48.30341580103962]
We propose a novel re-visible dual-domain self-supervised deep unfolding network to address these issues.<n>We design a deep unfolding network based on Chambolle and Pock Proximal Point Algorithm (DUN-CP-PPA) to achieve end-to-end reconstruction.<n> Experiments conducted on the fastMRI and IXI datasets demonstrate that our method significantly outperforms state-of-the-art approaches in terms of reconstruction performance.
arXiv Detail & Related papers (2025-01-07T12:29:32Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - Curriculum-Based Augmented Fourier Domain Adaptation for Robust Medical
Image Segmentation [18.830738606514736]
This work proposes the Curriculum-based Augmented Fourier Domain Adaptation (Curri-AFDA) for robust medical image segmentation.
In particular, our curriculum learning strategy is based on the causal relationship of a model under different levels of data shift.
Experiments on two segmentation tasks of Retina and Nuclei collected from multiple sites and scanners suggest that our proposed method yields superior adaptation and generalization performance.
arXiv Detail & Related papers (2023-06-06T08:56:58Z) - Frequency Disentangled Learning for Segmentation of Midbrain Structures
from Quantitative Susceptibility Mapping Data [1.9150304734969674]
Deep models tend to fit the target function from low to high frequencies.
One often lacks sufficient samples for training deep segmentation models.
We propose a new training method based on frequency-domain disentanglement.
arXiv Detail & Related papers (2023-02-25T04:30:11Z) - Deep learning based domain adaptation for mitochondria segmentation on
EM volumes [5.682594415267948]
We present three unsupervised domain adaptation strategies to improve mitochondria segmentation in the target domain.
We propose a new training stopping criterion based on morphological priors obtained exclusively in the source domain.
In the absence of validation labels, monitoring our proposed morphology-based metric is an intuitive and effective way to stop the training process and select in average optimal models.
arXiv Detail & Related papers (2022-02-22T09:49:25Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Cross-Site Severity Assessment of COVID-19 from CT Images via Domain
Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event.
To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites.
This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z) - Cross-Domain Segmentation with Adversarial Loss and Covariate Shift for
Biomedical Imaging [2.1204495827342438]
This manuscript aims to implement a novel model that can learn robust representations from cross-domain data by encapsulating distinct and shared patterns from different modalities.
The tests on CT and MRI liver data acquired in routine clinical trials show that the proposed model outperforms all other baseline with a large margin.
arXiv Detail & Related papers (2020-06-08T07:35:55Z) - MS-Net: Multi-Site Network for Improving Prostate Segmentation with
Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations.
Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.