A Multicenter Benchmark of Multiple Instance Learning Models for Lymphoma Subtyping from HE-stained Whole Slide Images
- URL: http://arxiv.org/abs/2512.14640v1
- Date: Tue, 16 Dec 2025 17:58:03 GMT
- Title: A Multicenter Benchmark of Multiple Instance Learning Models for Lymphoma Subtyping from HE-stained Whole Slide Images
- Authors: Rao Muhammad Umer, Daniel Sens, Jonathan Noll, Christian Matek, Lukas Wolfseher, Rainer Spang, Ralf Huss, Johannes Raffler, Sarah Reinke, Wolfram Klapper, Katja Steiger, Kristina Schwamborn, Carsten Marr,
- Abstract summary: We present the first multicenter lymphoma benchmarking dataset covering four common lymphoma subtypes and healthy control tissue.<n>We evaluate five publicly available pathology foundation models combined with attention-based (ABMIL) and transformer-based (TransMIL) multiple instance learning aggregators across three magnifications (10x, 20x, 40x)<n>On in-distribution test sets, models achieve multi-class balanced accuracies exceeding 80% across all magnifications, with all foundation models performing similarly and both aggregation methods showing comparable results.
- Score: 1.2229392997318513
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Timely and accurate lymphoma diagnosis is essential for guiding cancer treatment. Standard diagnostic practice combines hematoxylin and eosin (HE)-stained whole slide images with immunohistochemistry, flow cytometry, and molecular genetic tests to determine lymphoma subtypes, a process requiring costly equipment, skilled personnel, and causing treatment delays. Deep learning methods could assist pathologists by extracting diagnostic information from routinely available HE-stained slides, yet comprehensive benchmarks for lymphoma subtyping on multicenter data are lacking. In this work, we present the first multicenter lymphoma benchmarking dataset covering four common lymphoma subtypes and healthy control tissue. We systematically evaluate five publicly available pathology foundation models (H-optimus-1, H0-mini, Virchow2, UNI2, Titan) combined with attention-based (AB-MIL) and transformer-based (TransMIL) multiple instance learning aggregators across three magnifications (10x, 20x, 40x). On in-distribution test sets, models achieve multiclass balanced accuracies exceeding 80% across all magnifications, with all foundation models performing similarly and both aggregation methods showing comparable results. The magnification study reveals that 40x resolution is sufficient, with no performance gains from higher resolutions or cross-magnification aggregation. However, on out-of-distribution test sets, performance drops substantially to around 60%, highlighting significant generalization challenges. To advance the field, larger multicenter studies covering additional rare lymphoma subtypes are needed. We provide an automated benchmarking pipeline to facilitate such future research.
Related papers
- LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology [43.092364533480456]
Vision-threatening eye diseases pose a major global health burden, with timely diagnosis limited by workforce shortages and restricted access to specialized care.<n>We present a large-scale multimodal ophthalmology benchmark comprising 32,633 instances with multi-granular annotations across 12 common ophthalmic conditions and 5 imaging modalities.<n>The dataset integrates imaging, anatomical structures, demographics, and free-text annotations, supporting anatomical structure recognition, disease screening, disease staging, and demographic prediction for bias evaluation.
arXiv Detail & Related papers (2025-09-30T00:29:18Z) - Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping [80.92960114162746]
We propose PathPT, a novel framework that exploits the potential of vision-language pathology foundation models.<n>PathPT converts WSI-level supervision into fine-grained tile-level guidance by leveraging the zero-shot capabilities of VL models.<n>Results show that PathPT consistently delivers superior performance, achieving substantial gains in subtyping accuracy and cancerous region grounding ability.
arXiv Detail & Related papers (2025-08-21T18:04:41Z) - FoundBioNet: A Foundation-Based Model for IDH Genotyping of Glioma from Multi-Parametric MRI [1.4249472316161877]
We propose a Foundation-based Biomarker Network (FoundBioNet) to noninvasively predict IDH mutation status from multi-parametric MRI.<n>Our model was trained and validated on a diverse, multi-center cohort of 1705 glioma patients from six public datasets.<n>Our model achieved AUCs of 90.58%, 88.08%, 65.41%, and 80.31% on independent test sets from EGD, TCGA, Ivy GAP, RHUH, and UPenn.
arXiv Detail & Related papers (2025-08-09T00:08:10Z) - A Hybrid CNN-VSSM model for Multi-View, Multi-Task Mammography Analysis: Robust Diagnosis with Attention-Based Fusion [5.15423063632115]
Early and accurate interpretation of screening mammograms is essential for effective breast cancer detection.<n>Existing AI approaches fall short by focusing on single view inputs or single-task outputs.<n>We propose a novel multi-view, multitask hybrid deep learning framework that processes all four standard mammography views.
arXiv Detail & Related papers (2025-07-22T18:52:18Z) - GS-TransUNet: Integrated 2D Gaussian Splatting and Transformer UNet for Accurate Skin Lesion Analysis [44.99833362998488]
We present a novel approach that combines 2D Gaussian splatting with the Transformer UNet architecture for automated skin cancer diagnosis.<n>Our findings illustrate significant advancements in the precision of segmentation and classification.<n>This integration sets new benchmarks in the field and highlights the potential for further research into multi-task medical image analysis methodologies.
arXiv Detail & Related papers (2025-02-23T23:28:47Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Classification of lung cancer subtypes on CT images with synthetic
pathological priors [41.75054301525535]
Cross-scale associations exist in the image patterns between the same case's CT images and its pathological images.
We propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on CT images.
arXiv Detail & Related papers (2023-08-09T02:04:05Z) - A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models [0.0]
Whole Slide Image (WSI) analysis using deep learning models has shown promising potential for cancer diagnosis.
We propose a vision transformer-based framework for distinguishing DLBCL cancer subtypes from high-resolution WSIs.
Our experimental study conducted on a lymphoma dataset of 157 patients shows the promising performance of our monomodal classification model.
arXiv Detail & Related papers (2023-08-02T17:05:36Z) - Artificial-intelligence-based molecular classification of diffuse
gliomas using rapid, label-free optical imaging [59.79875531898648]
DeepGlioma is an artificial-intelligence-based diagnostic screening system.
DeepGlioma can predict the molecular alterations used by the World Health Organization to define the adult-type diffuse glioma taxonomy.
arXiv Detail & Related papers (2023-03-23T18:50:18Z) - Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents.
We generate an automatic tumor boundary detector for the rare disease of glioblastoma.
We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z) - Multi-View Hypercomplex Learning for Breast Cancer Screening [15.961240921898586]
We introduce multi-view hypercomplex learning, a novel learning paradigm for multi-view breast cancer classification.<n>Thanks to hypercomplex algebra, our models intrinsically capture both intra- and inter-view relations.<n>Our approach consistently outperforms state-of-the-art multi-view models.
arXiv Detail & Related papers (2022-04-12T13:32:31Z) - A Multi-Scale Conditional Deep Model for Tumor Cell Ratio Counting [4.164451715899639]
We propose a method to accurately obtain the ratio of tumor cells over an entire histological slide.
We use deep fully convolutional neural network models trained to detect and classify cells on images of H&E-stained tissue sections.
We show that combining two models, each working at a different magnification allows the system to capture both cell-level details and surrounding context.
arXiv Detail & Related papers (2021-01-27T22:40:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.