Full-resolution MLPs Empower Medical Dense Prediction
- URL: http://arxiv.org/abs/2311.16707v1
- Date: Tue, 28 Nov 2023 11:32:23 GMT
- Title: Full-resolution MLPs Empower Medical Dense Prediction
- Authors: Mingyuan Meng, Yuxin Xue, Dagan Feng, Lei Bi, and Jinman Kim
- Abstract summary: Multi-layer Perceptrons (MLPs) are superior alternatives to transformers in medical dense prediction.
Our framework achieves state-of-the-art performance on various medical dense prediction tasks.
- Score: 11.195630893999203
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Dense prediction is a fundamental requirement for many medical vision tasks
such as medical image restoration, registration, and segmentation. The most
popular vision model, Convolutional Neural Networks (CNNs), has reached
bottlenecks due to the intrinsic locality of convolution operations. Recently,
transformers have been widely adopted for dense prediction for their capability
to capture long-range visual dependence. However, due to the high computational
complexity and large memory consumption of self-attention operations,
transformers are usually used at downsampled feature resolutions. Such usage
cannot effectively leverage the tissue-level textural information available
only at the full image resolution. This textural information is crucial for
medical dense prediction as it can differentiate the subtle human anatomy in
medical images. In this study, we hypothesize that Multi-layer Perceptrons
(MLPs) are superior alternatives to transformers in medical dense prediction
where tissue-level details dominate the performance, as MLPs enable long-range
dependence at the full image resolution. To validate our hypothesis, we develop
a full-resolution hierarchical MLP framework that uses MLPs beginning from the
full image resolution. We evaluate this framework with various MLP blocks on a
wide range of medical dense prediction tasks including restoration,
registration, and segmentation. Extensive experiments on six public
well-benchmarked datasets show that, by simply using MLPs at full resolution,
our framework outperforms its CNN and transformer counterparts and achieves
state-of-the-art performance on various medical dense prediction tasks.
Related papers
- MOSMOS: Multi-organ segmentation facilitated by medical report supervision [10.396987980136602]
We propose a novel pre-training & fine-tuning framework for Multi-Organ Supervision (MOS)
Specifically, we first introduce global contrastive learning to align medical image-report pairs in the pre-training stage.
To remedy the discrepancy, we further leverage multi-label recognition to implicitly learn the semantic correspondence between image pixels and organ tags.
arXiv Detail & Related papers (2024-09-04T03:46:17Z) - MM-UNet: A Mixed MLP Architecture for Improved Ophthalmic Image Segmentation [3.2846676620336632]
Ophthalmic image segmentation serves as a critical foundation for ocular disease diagnosis.
Transformer-based models address these limitations but introduce substantial computational overhead.
We introduce MM-UNet, an efficient Mixed model tailored for ophthalmic image segmentation.
arXiv Detail & Related papers (2024-08-16T08:34:50Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder [26.830574964308962]
We introduce MedFLIP, a Fast Language-Image Pre-training method for Medical analysis.
We explore MAEs for zero-shot learning with crossed domains, which enhances the model's ability to learn from limited data.
Lastly, we validate using language will improve the zero-shot performance for the medical image analysis.
arXiv Detail & Related papers (2024-03-07T16:11:43Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - C^2M-DoT: Cross-modal consistent multi-view medical report generation
with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT)
C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Pyramid Medical Transformer for Medical Image Segmentation [8.157373686645318]
We develop a novel method to integrate multi-scale attention and CNN feature extraction using a pyramidal network architecture, namely Pyramid Medical Transformer (PMTrans)
Experimental results on two medical image datasets, gland segmentation and MoNuSeg datasets, showed that PMTrans outperformed the latest CNN-based and transformer-based models for medical image segmentation.
arXiv Detail & Related papers (2021-04-29T23:57:20Z) - TransMed: Transformers Advance Multi-modal Medical Image Classification [4.500880052705654]
convolutional neural networks (CNN) have shown very competitive performance in medical image analysis tasks.
Transformers have been applied to computer vision and achieved remarkable success in large-scale datasets.
TransMed combines the advantages of CNN and transformer to efficiently extract low-level features of images.
arXiv Detail & Related papers (2021-03-10T08:57:53Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.