Related papers: Evaluating New AI Cell Foundation Models on Challenging Kidney Pathology Cases Unaddressed by Previous Foundation Models

Evaluating New AI Cell Foundation Models on Challenging Kidney Pathology Cases Unaddressed by Previous Foundation Models

URL: http://arxiv.org/abs/2510.01287v1
Date: Wed, 01 Oct 2025 00:38:36 GMT
Title: Evaluating New AI Cell Foundation Models on Challenging Kidney Pathology Cases Unaddressed by Previous Foundation Models
Authors: Runchen Wang, Junlin Guo, Siqi Lu, Ruining Deng, Zhengyi Lu, Yanfan Zhu, Yuechen Yang, Chongyu Qu, Yu Wang, Shilin Zhao, Catie Chang, Mitchell Wilkes, Mengmeng Yin, Haichun Yang, Yuankai Huo,
Abstract summary: Accurate cell nuclei segmentation is critical for downstream tasks in kidney pathology.<n>We benchmarked advanced AI cell foundation models against three widely used cell foundation models developed prior to 2024.<n>CellViT++ [Virchow] yields the highest standalone performance with 40.3% of predictions rated as "Good" on a curated set of 2,091 challenging samples.
Score: 7.770106550946461
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate cell nuclei segmentation is critical for downstream tasks in kidney pathology and remains a major challenge due to the morphological diversity and imaging variability of renal tissues. While our prior work has evaluated early-generation AI cell foundation models in this domain, the effectiveness of recent cell foundation models remains unclear. In this study, we benchmark advanced AI cell foundation models (2025), including CellViT++ variants and Cellpose-SAM, against three widely used cell foundation models developed prior to 2024, using a diverse large-scale set of kidney image patches within a human-in-the-loop rating framework. We further performed fusion-based ensemble evaluation and model agreement analysis to assess the segmentation capabilities of the different models. Our results show that CellViT++ [Virchow] yields the highest standalone performance with 40.3% of predictions rated as "Good" on a curated set of 2,091 challenging samples, outperforming all prior models. In addition, our fused model achieves 62.2% "Good" predictions and only 0.4% "Bad", substantially reducing segmentation errors. Notably, the fusion model (2025) successfully resolved the majority of challenging cases that remained unaddressed in our previous study. These findings demonstrate the potential of AI cell foundation model development in renal pathology and provide a curated dataset of challenging samples to support future kidney-specific model refinement.

Related papers

Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z)
TEDDY: A Family Of Foundation Models For Understanding Single Cell Biology [6.289686541194788]
Existing foundation models either do not improve or only modestly improve over task-specific models in downstream applications.<n>We scaled the pre-training dataset to 116 million cells, which is larger than those used by previous models.<n>We trained the TEDDY family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters.
arXiv Detail & Related papers (2025-03-05T13:24:57Z)
Merging synthetic and real embryo data for advanced AI predictions [69.07284335967019]
We train two generative models using two datasets-one we created and made publicly available, and one existing public dataset-to generate synthetic embryo images at various cell stages.<n>These were combined with real images to train classification models for embryo cell stage prediction.<n>Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 94.5% when trained solely on real data.
arXiv Detail & Related papers (2024-12-02T08:24:49Z)
CRTRE: Causal Rule Generation with Target Trial Emulation Framework [47.2836994469923]
We introduce a novel method called causal rule generation with target trial emulation framework (CRTRE) CRTRE applies randomize trial design principles to estimate the causal effect of association rules. We then incorporate such association rules for the downstream applications such as prediction of disease onsets.
arXiv Detail & Related papers (2024-11-10T02:40:06Z)
How Good Are We? Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment [11.60167559546617]
Training AI foundation models have emerged as a promising large-scale learning approach for addressing real-world healthcare challenges. While many of these models have been developed for tasks like disease diagnosis and tissue quantification, their readiness for deployment on some arguably simplest tasks, such as nuclei segmentation within a single organ, remains uncertain. This paper seeks to answer this key question, "How good are we?" by thoroughly evaluating the performance of recent cell foundation models on a curated dataset.
arXiv Detail & Related papers (2024-10-31T17:00:33Z)
Benchmarking foundation models as feature extractors for weakly-supervised computational pathology [0.6151041580858937]
We benchmarked 19 histopathology foundation models on 13 patient cohorts with 6,818 patients and 9,528 slides from lung, colorectal, gastric, and breast cancers.<n>We show that a vision-language foundation model, CONCH, yielded the highest performance when compared to vision-only foundation models, with Virchow2 as close second.
arXiv Detail & Related papers (2024-08-28T14:34:45Z)
Assessment of Cell Nuclei AI Foundation Models in Kidney Pathology [11.60167559546617]
This study is the largest-scale evaluation of its kind to date. To our knowledge, this is the largest-scale evaluation of its kind to date.<n>Among the evaluated models, CellViT demonstrated superior performance in segmenting nuclei in kidney pathology.<n>However, none of the foundation models are perfect; a performance gap remains in general nuclei segmentation for kidney pathology.
arXiv Detail & Related papers (2024-08-09T22:34:13Z)
A Comprehensive Evaluation of Histopathology Foundation Models for Ovarian Cancer Subtype Classification [1.9499122087408571]
Histopathology foundation models show great promise across many tasks. We report the most rigorous single-task validation of histopathology foundation models to date. Histopathology foundation models offer a clear benefit to ovarian cancer subtyping.
arXiv Detail & Related papers (2024-05-16T11:21:02Z)
CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting [46.45578907156356]
We setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. Our findings suggest that nuclei and eosinophils play an important role in the tumour microevironment.
arXiv Detail & Related papers (2023-03-11T01:21:13Z)
On the explainability of hospitalization prediction on a large COVID-19 patient dataset [45.82374977939355]
We develop various AI models to predict hospitalization on a large (over 110$k$) cohort of COVID-19 positive-tested US patients. Despite high data unbalance, the models reach average precision 0.96-0.98 (0.75-0.85), recall 0.96-0.98 (0.74-0.85), and $F_score 0.97-0.98 (0.79-0.83) on the non-hospitalized (or hospitalized) class.
arXiv Detail & Related papers (2021-10-28T10:23:38Z)
A multi-stage machine learning model on diagnosis of esophageal manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z)
Improved Techniques for Training Score-Based Generative Models [104.20217659157701]
We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces. We can effortlessly scale score-based generative models to images with unprecedented resolutions. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets.
arXiv Detail & Related papers (2020-06-16T09:17:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.