MECFormer: Multi-task Whole Slide Image Classification with Expert Consultation Network
- URL: http://arxiv.org/abs/2410.04507v2
- Date: Tue, 8 Oct 2024 10:48:05 GMT
- Title: MECFormer: Multi-task Whole Slide Image Classification with Expert Consultation Network
- Authors: Doanh C. Bui, Jin Tae Kwak,
- Abstract summary: Whole slide image (WSI) classification is a crucial problem for cancer diagnostics in clinics and hospitals.
Previous MIL-based models designed for this problem have only been evaluated on individual tasks for specific organs.
We propose MECFormer, a generative Transformer-based model designed to handle multiple tasks within one model.
- Score: 2.6954348706500766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Whole slide image (WSI) classification is a crucial problem for cancer diagnostics in clinics and hospitals. A WSI, acquired at gigapixel size, is commonly tiled into patches and processed by multiple-instance learning (MIL) models. Previous MIL-based models designed for this problem have only been evaluated on individual tasks for specific organs, and the ability to handle multiple tasks within a single model has not been investigated. In this study, we propose MECFormer, a generative Transformer-based model designed to handle multiple tasks within one model. To leverage the power of learning multiple tasks simultaneously and to enhance the model's effectiveness in focusing on each individual task, we introduce an Expert Consultation Network, a projection layer placed at the beginning of the Transformer-based model. Additionally, to enable flexible classification, autoregressive decoding is incorporated by a language decoder for WSI classification. Through extensive experiments on five datasets involving four different organs, one cancer classification task, and four cancer subtyping tasks, MECFormer demonstrates superior performance compared to individual state-of-the-art multiple-instance learning models.
Related papers
- SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing.
It is designed to accurately detect horizontal or oriented objects from any sensor modality.
This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z) - CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology [17.781388341968967]
CPath- Omni is the first LMM designed to unify both patch and WSI level image analysis.
CPath- Omni achieves state-of-the-art (SOTA) performance across seven diverse tasks on 39 out of 42 datasets.
CPath-CLIP, for the first time, integrates different vision models and incorporates a large language model as a text encoder to build a more powerful CLIP model.
arXiv Detail & Related papers (2024-12-16T18:46:58Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis [16.326593081399775]
We propose an adapted architecture of Multi-gate Mixture-of-experts with Multi-proxy for Multiple instance learning (M4)
Our model achieved significant improvements across five tested TCGA datasets in comparison to current state-of-the-art single-task methods.
arXiv Detail & Related papers (2024-07-24T13:30:46Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - VIS-MAE: An Efficient Self-supervised Learning Approach on Medical Image Segmentation and Classification [33.699424327366856]
We present VIsualization and Masked AutoEncoder (VIS-MAE), novel model weights specifically designed for medical imaging.
VIS-MAE is trained on a dataset of 2.5 million unlabeled images from various modalities.
It is then adapted to classification and segmentation tasks using explicit labels.
arXiv Detail & Related papers (2024-02-01T21:45:12Z) - Gene-induced Multimodal Pre-training for Image-omic Classification [20.465959546613554]
This paper proposes a Gene-induced Multimodal Pre-training framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks.
Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification.
arXiv Detail & Related papers (2023-09-06T04:30:15Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical
Image Segmentation [73.10707675345253]
We propose a general multi-scale in multi-scale subtraction network (M$2$SNet) to finish diverse segmentation from medical image.
Our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:26:49Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - UNetFormer: A Unified Vision Transformer Model and Pre-Training
Framework for 3D Medical Image Segmentation [14.873473285148853]
We introduce a unified framework consisting of two architectures, dubbed UNetFormer, with a 3D Swin Transformer-based encoder and Conal Neural Network (CNN) and transformer-based decoders.
In the proposed model, the encoder is linked to the decoder via skip connections at five different resolutions with deep supervision.
We present a methodology for self-supervised pre-training of the encoder backbone via learning to predict randomly masked tokens.
arXiv Detail & Related papers (2022-04-01T17:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.