Enhanced Diagnostic Performance via Large-Resolution Inference Optimization for Pathology Foundation Models
- URL: http://arxiv.org/abs/2601.12150v1
- Date: Sat, 17 Jan 2026 19:50:40 GMT
- Title: Enhanced Diagnostic Performance via Large-Resolution Inference Optimization for Pathology Foundation Models
- Authors: Mengxuan Hu, Zihan Guan, John Kang, Sheng Li, Zhongliang Zhou,
- Abstract summary: A naive strategy is to either enlarge inputs or downsample the whole-slide images.<n>We propose a space- and time- efficient inference strategy that sparsifies attention using spatially aware neighboring blocks.<n>This design substantially reduces GPU memory and runtime during high-resolution WSI inference.
- Score: 12.8452590947141
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite their prominent performance on tasks such as ROI classification and segmentation, many pathology foundation models remain constrained by a specific input size e.g. 224 x 224, creating substantial inefficiencies when applied to whole-slide images (WSIs), which span thousands of resolutions. A naive strategy is to either enlarge inputs or downsample the WSIs. However, enlarging inputs results in prohibitive GPU memory consumption, while downsampling alters the microns-per-pixel resolution and obscures critical morphological details. To overcome these limitations, we propose an space- and time- efficient inference strategy that sparsifies attention using spatially aware neighboring blocks and filters out non-informative tokens through global attention scores. This design substantially reduces GPU memory and runtime during high-resolution WSI inference while preserving and even improving the downstream performance, enabling inference at higher resolutions under the same GPU budget. The experimental results show that our method can achieves up to an 7.67% improvement in the ROI classification and compatible results in segmentation.
Related papers
- WSI-INR: Implicit Neural Representations for Lesion Segmentation in Whole-Slide Images [18.13897875757054]
Whole-slide images (WSIs) are fundamental for computational pathology, where accurate lesion segmentation is critical for clinical decision making.<n>Existing methods partition WSIs into discrete patches, disrupting spatial continuity and treating multi-resolution views as independent samples.<n>We propose WSI-INR, a novel patch-free framework based on Implicit Neural Representations (INRs)<n> WSI-INR models the WSI as a continuous implicit function mapping spatial coordinates directly to tissue semantics features, outputting segmentation results while preserving intrinsic spatial information across the entire slide.
arXiv Detail & Related papers (2026-03-04T05:41:53Z) - Cross-Layer Attentive Feature Upsampling for Low-latency Semantic Segmentation [52.01210390327581]
We propose Guided Attentive Interpolation (GAI) to adaptively interpolate fine-grained high-resolution features with semantic features.<n>GAI determines both spatial and semantic relations of pixels from features of different resolutions and then leverages these relations to interpolate high-resolution features with rich semantics.<n>In experiments, the GAI-based semantic segmentation networks, i.e., GAIN, can achieve78.8 mIoU with 22.3 FPS on Cityscapes and 80.6 mIoU with 64.5 on CamVid using an NVIDIA 1080Ti GPU.
arXiv Detail & Related papers (2026-01-03T12:09:49Z) - RARE-UNet: Resolution-Aligned Routing Entry for Adaptive Medical Image Segmentation [0.0]
We propose a resolution-aware multi-scale segmentation architecture that adapts its inference path to the spatial resolution of the input.<n>RARE-UNet is tested on two benchmark brain imaging tasks for hippocampus and tumor segmentation.<n>Our model achieves the highest average Dice scores of 0.84 and 0.65 across resolution, while maintaining consistent performance and significantly reduced inference time at lower resolutions.
arXiv Detail & Related papers (2025-07-21T11:49:20Z) - A Novel Downsampling Strategy Based on Information Complementarity for Medical Image Segmentation [1.9214752983226675]
This study proposes a downsampling method based on information complementarity - Hybrid Pooling Downsampling (HPD)<n>The core is to replace the traditional method with MinMaxing, and effectively retain the light and dark contrast and detail features of the image by extracting the maximum value information of the local area.<n>Experiment on various CNN architectures on the ACDC and Synapse datasets show that HPD outperforms traditional methods in segmentation performance, and increases the DSC coefficient by 0.5% on average.
arXiv Detail & Related papers (2025-07-20T02:30:34Z) - HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation [74.1872891313184]
HRSeg is an efficient model with high-resolution fine-grained perception.<n>It features two key innovations: High-Resolution Perception (HRP) and High-Resolution Enhancement (HRE)
arXiv Detail & Related papers (2025-07-17T08:09:31Z) - HRDecoder: High-Resolution Decoder Network for Fundus Image Lesion Segmentation [12.606794661369959]
We propose HRDecoder, a simple High-Resolution Decoder network for fundus lesion segmentation.<n>It integrates a high-resolution representation learning module to capture fine-grained local features and a high-resolution fusion module to fuse multi-scale predictions.<n>Our method effectively improves the overall segmentation accuracy of fundus lesions while consuming reasonable memory and computational overhead, and maintaining satisfying inference speed.
arXiv Detail & Related papers (2024-11-06T15:13:31Z) - AUCSeg: AUC-oriented Pixel-level Long-tail Semantic Segmentation [88.50256898176269]
We develop a pixel-level AUC loss function and conduct a dependency-graph-based theoretical analysis of the algorithm's generalization ability.
We also design a Tail-Classes Memory Bank to manage the significant memory demand.
arXiv Detail & Related papers (2024-09-30T15:31:02Z) - Learning to Be a Transformer to Pinpoint Anomalies [12.442574943138794]
Recent Industrial Anomaly Detection and (IADS) methods process low-resolution images, e.g., 224x224 pixels, obtained by downsampling the original input images.<n>We propose a novel Teacher--Student paradigm to leverage strong pre-trained features while processing high-resolution input images very efficiently.<n>Our method can spot anomalies from high-resolution images and runs way faster than competitors.
arXiv Detail & Related papers (2024-07-04T17:59:26Z) - Memory-Constrained Semantic Segmentation for Ultra-High Resolution UAV
Imagery [35.96063342025938]
This paper explores the intricate problem of achieving efficient and effective segmentation of ultra-high resolution UAV imagery.
We propose a GPU memory-efficient and effective framework for local inference without accessing the context beyond local patches.
We present an efficient memory-based interaction scheme to correct potential semantic bias of the underlying high-resolution information.
arXiv Detail & Related papers (2023-10-07T07:44:59Z) - ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to
Improve Segmentation Performance [61.04246102067351]
We propose a foreground harmonization framework (ARHNet) to tackle intensity disparities and make synthetic images look more realistic.
We demonstrate the efficacy of our method in improving the segmentation performance using real and synthetic images.
arXiv Detail & Related papers (2023-07-02T10:39:29Z) - Hierarchical Deep CNN Feature Set-Based Representation Learning for
Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics.
Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space.
In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.