MedViT: A Robust Vision Transformer for Generalized Medical Image
Classification
- URL: http://arxiv.org/abs/2302.09462v1
- Date: Sun, 19 Feb 2023 02:55:45 GMT
- Title: MedViT: A Robust Vision Transformer for Generalized Medical Image
Classification
- Authors: Omid Nejati Manzari, Hamid Ahmadabadi, Hossein Kashiani, Shahriar B.
Shokouhi, Ahmad Ayatollahi
- Abstract summary: We propose a robust yet efficient CNN-Transformer hybrid model which is equipped with the locality of CNNs and the global connectivity of vision Transformers.
Our proposed hybrid model demonstrates its high robustness and generalization ability compared to the state-of-the-art studies on a large-scale collection of standardized MedMNIST-2D datasets.
- Score: 4.471084427623774
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Convolutional Neural Networks (CNNs) have advanced existing medical systems
for automatic disease diagnosis. However, there are still concerns about the
reliability of deep medical diagnosis systems against the potential threats of
adversarial attacks since inaccurate diagnosis could lead to disastrous
consequences in the safety realm. In this study, we propose a highly robust yet
efficient CNN-Transformer hybrid model which is equipped with the locality of
CNNs as well as the global connectivity of vision Transformers. To mitigate the
high quadratic complexity of the self-attention mechanism while jointly
attending to information in various representation subspaces, we construct our
attention mechanism by means of an efficient convolution operation. Moreover,
to alleviate the fragility of our Transformer model against adversarial
attacks, we attempt to learn smoother decision boundaries. To this end, we
augment the shape information of an image in the high-level feature space by
permuting the feature mean and variance within mini-batches. With less
computational complexity, our proposed hybrid model demonstrates its high
robustness and generalization ability compared to the state-of-the-art studies
on a large-scale collection of standardized MedMNIST-2D datasets.
Related papers
- TransUNext: towards a more advanced U-shaped framework for automatic vessel segmentation in the fundus image [19.16680702780529]
We propose a more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext.
The Global Multi-Scale Fusion (GMSF) module is further introduced to upgrade skip-connections, fuse high-level semantic and low-level detailed information, and eliminate high- and low-level semantic differences.
arXiv Detail & Related papers (2024-11-05T01:44:22Z) - TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation [6.013821375459473]
We introduce a novel deep learning architecture for medical image segmentation.
Our proposed model shows consistent improvement over the state of the art on ten publicly available datasets.
arXiv Detail & Related papers (2024-09-05T09:14:03Z) - Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers.
The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z) - CAF-YOLO: A Robust Framework for Multi-Scale Lesion Detection in Biomedical Imagery [0.0682074616451595]
CAF-YOLO is a nimble yet robust method for medical object detection that leverages the strengths of convolutional neural networks (CNNs) and transformers.
ACFM module enhances the modeling of both global and local features, enabling the capture of long-term feature dependencies.
MSNN improves multi-scale information aggregation by extracting features across diverse scales.
arXiv Detail & Related papers (2024-08-04T01:44:44Z) - Harnessing The Power of Attention For Patch-Based Biomedical Image Classification [0.0]
We present a novel architecture based on self-attention mechanisms as an alternative to conventional CNNs.
We introduce the Lancoz5 technique, which adapts variable image sizes to higher resolutions.
Our methods address critical challenges faced by attention-based vision models, including inductive bias, weight sharing, receptive field limitations, and efficient data handling.
arXiv Detail & Related papers (2024-04-01T06:22:28Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Breast Ultrasound Tumor Classification Using a Hybrid Multitask
CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification.
Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations.
In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Self-Supervised Masked Convolutional Transformer Block for Anomaly
Detection [122.4894940892536]
We present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level.
In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
arXiv Detail & Related papers (2022-09-25T04:56:10Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - MSHT: Multi-stage Hybrid Transformer for the ROSE Image Analysis of
Pancreatic Cancer [5.604939010661757]
Pancreatic cancer is one of the most malignant cancers in the world, which deteriorates rapidly with very high mortality.
We propose a hybrid high-performance deep learning model to enable the automated workflow.
A dataset of 4240 ROSE images is collected to evaluate the method in this unexplored field.
arXiv Detail & Related papers (2021-12-27T05:04:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.