PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification
- URL: http://arxiv.org/abs/2503.16284v1
- Date: Thu, 20 Mar 2025 16:12:42 GMT
- Title: PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification
- Authors: Sharon Peled, Yosef E. Maruvka, Moti Freiman,
- Abstract summary: Whole Slide Images (WSIs) are high-resolution digital scans widely used in medical diagnostics.<n>We propose Probabilistic Spatial Attention MIL (PSA-MIL), a novel attention-based MIL framework that integrates spatial context into the attention mechanism.<n>We achieve state-of-the-art performance across both contextual and non-contextual baselines, while significantly reducing computational costs.
- Score: 3.1406146587437904
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Whole Slide Images (WSIs) are high-resolution digital scans widely used in medical diagnostics. WSI classification is typically approached using Multiple Instance Learning (MIL), where the slide is partitioned into tiles treated as interconnected instances. While attention-based MIL methods aim to identify the most informative tiles, they often fail to fully exploit the spatial relationships among them, potentially overlooking intricate tissue structures crucial for accurate diagnosis. To address this limitation, we propose Probabilistic Spatial Attention MIL (PSA-MIL), a novel attention-based MIL framework that integrates spatial context into the attention mechanism through learnable distance-decayed priors, formulated within a probabilistic interpretation of self-attention as a posterior distribution. This formulation enables a dynamic inference of spatial relationships during training, eliminating the need for predefined assumptions often imposed by previous approaches. Additionally, we suggest a spatial pruning strategy for the posterior, effectively reducing self-attention's quadratic complexity. To further enhance spatial modeling, we introduce a diversity loss that encourages variation among attention heads, ensuring each captures distinct spatial representations. Together, PSA-MIL enables a more data-driven and adaptive integration of spatial context, moving beyond predefined constraints. We achieve state-of-the-art performance across both contextual and non-contextual baselines, while significantly reducing computational costs.
Related papers
- Reduced Spatial Dependency for More General Video-level Deepfake Detection [9.51656628987442]
We propose a novel method called Spatial Dependency Reduction (SDR), which integrates common temporal consistency features from multiple spatially-perturbed clusters.
Extensive benchmarks and ablation studies demonstrate the effectiveness and rationale of our approach.
arXiv Detail & Related papers (2025-03-05T08:51:55Z) - PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis [1.1619559582563954]
We propose PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA)
PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics.
The learnable graph structure can also denoise perturbations by learning cross-modal knowledge.
arXiv Detail & Related papers (2024-09-19T12:53:29Z) - CARMIL: Context-Aware Regularization on Multiple Instance Learning models for Whole Slide Images [0.41873161228906586]
Multiple Instance Learning models have proven effective for cancer prognosis from Whole Slide Images.
The original MIL formulation incorrectly assumes the patches of the same image to be independent.
We propose a versatile regularization scheme designed to seamlessly integrate spatial knowledge into any MIL model.
arXiv Detail & Related papers (2024-08-01T09:59:57Z) - GATGPT: A Pre-trained Large Language Model with Graph Attention Network
for Spatiotemporal Imputation [19.371155159744934]
In real-world settings, such data often contain missing elements due to issues like sensor malfunctions and data transmission errors.
The objective oftemporal imputation is to estimate these missing values by understanding the inherent spatial and temporal relationships in the observed time series.
Traditionally, intricatetemporal imputation has relied on specific architectures, which suffer from limited applicability and high computational complexity.
In contrast our approach integrates pre-trained large language models (LLMs) into intricatetemporal imputation, introducing a groundbreaking framework, GATGPT.
arXiv Detail & Related papers (2023-11-24T08:15:11Z) - Unified Domain Adaptive Semantic Segmentation [96.74199626935294]
Unsupervised Adaptive Domain Semantic (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain.
We propose a Quad-directional Mixup (QuadMix) method, characterized by tackling distinct point attributes and feature inconsistencies.
Our method outperforms the state-of-the-art works by large margins on four challenging UDA-SS benchmarks.
arXiv Detail & Related papers (2023-11-22T09:18:49Z) - Improving Vision Anomaly Detection with the Guidance of Language
Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view.
We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue.
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z) - Learning Multiscale Consistency for Self-supervised Electron Microscopy
Instance Segmentation [48.267001230607306]
We propose a pretraining framework that enhances multiscale consistency in EM volumes.
Our approach leverages a Siamese network architecture, integrating strong and weak data augmentations.
It effectively captures voxel and feature consistency, showing promise for learning transferable representations for EM analysis.
arXiv Detail & Related papers (2023-08-19T05:49:13Z) - Spatio-temporal Diffusion Point Processes [23.74522530140201]
patio-temporal point process (STPP) is a collection of events accompanied with time and space.
The failure to model the joint distribution leads to limited capacities in characterizing the pasthua-temporal interactions given events.
We propose a novel parameterization framework, which learns complex spatial-temporal joint distributions.
Our framework outperforms the state-of-the-art baselines remarkably, with an average improvement over 50%.
arXiv Detail & Related papers (2023-05-21T08:53:00Z) - Superficial White Matter Analysis: An Efficient Point-cloud-based Deep
Learning Framework with Supervised Contrastive Learning for Consistent
Tractography Parcellation across Populations and dMRI Acquisitions [68.41088365582831]
White matter parcellation classifies tractography streamlines into clusters or anatomically meaningful tracts.
Most parcellation methods focus on the deep white matter (DWM), whereas fewer methods address the superficial white matter (SWM) due to its complexity.
We propose a novel two-stage deep-learning-based framework, Superficial White Matter Analysis (SupWMA), that performs an efficient parcellation of 198 SWM clusters from whole-brain tractography.
arXiv Detail & Related papers (2022-07-18T23:07:53Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Statistical control for spatio-temporal MEG/EEG source imaging with
desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques.
The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge.
We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.