Related papers: From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection

From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection

URL: http://arxiv.org/abs/2511.05150v1
Date: Fri, 07 Nov 2025 11:05:36 GMT
Title: From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection
Authors: Jingsong Liu, Han Li, Nassir Navab, Peter J. Schüffler,
Abstract summary: AI-based biomarkers can infer molecular features directly from hematoxylin & eosin (H&E) slides.<n>Most pathology foundation models (PFMs) rely on global patch-level embeddings and overlook cell-level morphology.<n>We present a PFM model, JWTH, which integrates large-scale self-supervised pretraining with cell-centric post-tuning and attention pooling to fuse local and global tokens.
Score: 44.3895875409365
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI-based biomarkers can infer molecular features directly from hematoxylin & eosin (H&E) slides, yet most pathology foundation models (PFMs) rely on global patch-level embeddings and overlook cell-level morphology. We present a PFM model, JWTH (Joint-Weighted Token Hierarchy), which integrates large-scale self-supervised pretraining with cell-centric post-tuning and attention pooling to fuse local and global tokens. Across four tasks involving four biomarkers and eight cohorts, JWTH achieves up to 8.3% higher balanced accuracy and 1.2% average improvement over prior PFMs, advancing interpretable and robust AI-based biomarker detection in digital pathology.

Related papers

Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z)
EXAONE Path 2.5: Pathology Foundation Model with Multi-Omics Alignment [7.030162358506499]
We present EXAONE Path 2.5, a pathology foundation model that jointly models histologic, genomic, epigenetic and transcriptomic modalities.<n>We evaluate EXAONE Path 2.5 against six leading pathology foundation models across two complementary benchmarks.
arXiv Detail & Related papers (2025-12-16T02:31:53Z)
PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer [54.958921946378304]
We introduce PanFoMa, a lightweight hybrid neural network that combines the strengths of Transformers and state-space models.<n>PanFoMa consists of a front-end local-context encoder with shared self-attention layers to capture complex, order-independent gene interactions.<n>We also construct a large-scale pan-cancer single-cell benchmark, PanFoMaBench, containing over 3.5 million high-quality cells.
arXiv Detail & Related papers (2025-12-02T08:31:31Z)
Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling [74.25438319700929]
We propose CHMR (Cell-aware Hierarchical Multi-modal Representations), a robust framework that models local-global dependencies between molecules and cellular responses.<n> evaluated on nine public benchmarks spanning 728 tasks, CHMR outperforms state-of-the-art baselines.<n>Results demonstrate the advantage of hierarchy-aware, multimodal learning for reliable and biologically grounded molecular representations.
arXiv Detail & Related papers (2025-11-26T07:15:00Z)
A Machine Learning Pipeline for Multiple Sclerosis Biomarker Discovery: Comparing explainable AI and Traditional Statistical Approaches [35.18016233072556]
We present a machine learning pipeline for biomarker discovery in Multiple Sclerosis (MS)<n>After robust preprocessing we trained an XGBoost classifier optimized via Bayesian search.<n>Our comparison revealed both overlapping and unique biomarkers between SHAP and DEA, suggesting complementary strengths.<n>This study highlights the value of combining explainable AI (xAI) with traditional statistical methods to gain deeper insights into disease mechanism.
arXiv Detail & Related papers (2025-09-26T15:31:34Z)
The Next Layer: Augmenting Foundation Models with Structure-Preserving and Attention-Guided Learning for Local Patches to Global Context Awareness in Computational Pathology [23.32822092398391]
We present EAGLE-Net, a structure-preserving, attention-guided MIL architecture designed to augment prediction and interpretability.<n>We benchmarked it on large pan-cancer datasets, including 3 cancer types for classification (10,260 slides) and 7 cancer types for survival prediction (4,172 slides)
arXiv Detail & Related papers (2025-08-27T14:19:38Z)
FoundBioNet: A Foundation-Based Model for IDH Genotyping of Glioma from Multi-Parametric MRI [1.4249472316161877]
We propose a Foundation-based Biomarker Network (FoundBioNet) to noninvasively predict IDH mutation status from multi-parametric MRI.<n>Our model was trained and validated on a diverse, multi-center cohort of 1705 glioma patients from six public datasets.<n>Our model achieved AUCs of 90.58%, 88.08%, 65.41%, and 80.31% on independent test sets from EGD, TCGA, Ivy GAP, RHUH, and UPenn.
arXiv Detail & Related papers (2025-08-09T00:08:10Z)
Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation [46.36100528165335]
Photoplethysmography and electrocardiography can potentially enable continuous blood pressure (BP) monitoring.<n>Yet accurate and robust machine learning (ML) models remains challenging due to variability in data quality and patient-specific factors.<n>In this work, we investigate whether a model pre-trained on one modality can effectively be exploited to improve the accuracy of a different signal type.<n>Our approach achieves near state-of-the-art accuracy for diastolic BP and surpasses by 1.5x the accuracy of prior works for systolic BP.
arXiv Detail & Related papers (2025-02-10T13:33:12Z)
Augmenting Biomedical Named Entity Recognition with General-domain Resources [47.24727904076347]
Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations.<n>We propose GERBERA, a simple-yet-effective method that utilized general-domain NER datasets for training.<n>We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances.
arXiv Detail & Related papers (2024-06-15T15:28:02Z)
Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma [21.281553456323998]
Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG) We propose an interpretable deep learning pipeline to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression.
arXiv Detail & Related papers (2023-10-11T13:05:33Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.<n> Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.<n>By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types. Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.