Multi-head Attention-based Deep Multiple Instance Learning
- URL: http://arxiv.org/abs/2404.05362v1
- Date: Mon, 8 Apr 2024 09:54:28 GMT
- Title: Multi-head Attention-based Deep Multiple Instance Learning
- Authors: Hassan Keshvarikhojasteh, Josien Pluim, Mitko Veta,
- Abstract summary: MAD-MIL is a Multi-head Attention-based Deep Multiple Instance Learning model.
It is designed for weakly supervised Whole Slide Images (WSIs) classification in digital pathology.
evaluated on the MNIST-BAGS and public datasets, including TUPAC16, TCGA BRCA, TCGA LUNG, and TCGA KIDNEY.
- Score: 1.0389304366020162
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces MAD-MIL, a Multi-head Attention-based Deep Multiple Instance Learning model, designed for weakly supervised Whole Slide Images (WSIs) classification in digital pathology. Inspired by the multi-head attention mechanism of the Transformer, MAD-MIL simplifies model complexity while achieving competitive results against advanced models like CLAM and DS-MIL. Evaluated on the MNIST-BAGS and public datasets, including TUPAC16, TCGA BRCA, TCGA LUNG, and TCGA KIDNEY, MAD-MIL consistently outperforms ABMIL. This demonstrates enhanced information diversity, interpretability, and efficiency in slide representation. The model's effectiveness, coupled with fewer trainable parameters and lower computational complexity makes it a promising solution for automated pathology workflows. Our code is available at https://github.com/tueimage/MAD-MIL.
Related papers
- Large Language Models for Multimodal Deformable Image Registration [50.91473745610945]
We propose a novel coarse-to-fine MDIR framework,LLM-Morph, for aligning the deep features from different modal medical images.
Specifically, we first utilize a CNN encoder to extract deep visual features from cross-modal image pairs, then we use the first adapter to adjust these tokens, and use LoRA in pre-trained LLMs to fine-tune their weights.
Third, for the alignment of tokens, we utilize other four adapters to transform the LLM-encoded tokens into multi-scale visual features, generating multi-scale deformation fields and facilitating the coarse-to-fine MDIR task
arXiv Detail & Related papers (2024-08-20T09:58:30Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - SCHEME: Scalable Channel Mixer for Vision Transformers [52.605868919281086]
Vision Transformers have achieved impressive performance in many vision tasks.
Much less research has been devoted to the channel mixer or feature mixing block (FFN or)
We show that the dense connections can be replaced with a diagonal block structure that supports larger expansion ratios.
arXiv Detail & Related papers (2023-12-01T08:22:34Z) - MatFormer: Nested Transformer for Elastic Inference [94.1789252941718]
MatFormer is a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints.
We show that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B.
We also observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
arXiv Detail & Related papers (2023-10-11T17:57:14Z) - Deformable Mixer Transformer with Gating for Multi-Task Learning of
Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL)
We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z) - Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole
Slide Image Classification [18.679580844360615]
We propose a new Pseudo-bag Mixup (PseMix) data augmentation scheme to improve the training of MIL models.
Our scheme generalizes the Mixup strategy for general images to special WSIs via pseudo-bags.
It is designed as an efficient and decoupled method, neither involving time-consuming operations nor relying on MIL model predictions.
arXiv Detail & Related papers (2023-06-28T13:02:30Z) - Multi-level Multiple Instance Learning with Transformer for Whole Slide
Image Classification [32.43847786719133]
Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD)
We propose a Multi-level MIL (MMIL) scheme by introducing a hierarchical structure to MIL, which enables efficient handling of MIL tasks involving a large number of instances.
Based on MMIL, we instantiated MMIL-Transformer, an efficient Transformer model with windowed exact self-attention for large-scale MIL tasks.
arXiv Detail & Related papers (2023-06-08T08:29:10Z) - Feature Re-calibration based MIL for Whole Slide Image Classification [7.92885032436243]
Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases.
We propose to re-calibrate the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature.
We employ a position encoding module (PEM) to model spatial/morphological information, and perform pooling by multi-head self-attention (PSMA) with a Transformer encoder.
arXiv Detail & Related papers (2022-06-22T07:00:39Z) - Differentiable Zooming for Multiple Instance Learning on Whole-Slide
Images [4.928363812223965]
We propose ZoomMIL, a method that learns to perform multi-level zooming in an end-to-end manner.
The proposed method outperforms the state-of-the-art MIL methods in WSI classification on two large datasets.
arXiv Detail & Related papers (2022-04-26T17:20:50Z) - DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning
for Histopathology Whole Slide Image Classification [18.11776334311096]
Multiple instance learning (MIL) has been increasingly used in the classification of histopathology whole slide images (WSIs)
We propose to virtually enlarge the number of bags by introducing the concept of pseudo-bags.
We also contribute to deriving the instance probability under the framework of attention-based MIL, and utilize the derivation to help construct and analyze the proposed framework.
arXiv Detail & Related papers (2022-03-22T22:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.