Related papers: MedViT: A Robust Vision Transformer for Generalized Medical Image Classification

MedViT: A Robust Vision Transformer for Generalized Medical Image Classification

URL: http://arxiv.org/abs/2302.09462v1
Date: Sun, 19 Feb 2023 02:55:45 GMT
Title: MedViT: A Robust Vision Transformer for Generalized Medical Image Classification
Authors: Omid Nejati Manzari, Hamid Ahmadabadi, Hossein Kashiani, Shahriar B. Shokouhi, Ahmad Ayatollahi
Abstract summary: We propose a robust yet efficient CNN-Transformer hybrid model which is equipped with the locality of CNNs and the global connectivity of vision Transformers. Our proposed hybrid model demonstrates its high robustness and generalization ability compared to the state-of-the-art studies on a large-scale collection of standardized MedMNIST-2D datasets.
Score: 4.471084427623774
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Convolutional Neural Networks (CNNs) have advanced existing medical systems for automatic disease diagnosis. However, there are still concerns about the reliability of deep medical diagnosis systems against the potential threats of adversarial attacks since inaccurate diagnosis could lead to disastrous consequences in the safety realm. In this study, we propose a highly robust yet efficient CNN-Transformer hybrid model which is equipped with the locality of CNNs as well as the global connectivity of vision Transformers. To mitigate the high quadratic complexity of the self-attention mechanism while jointly attending to information in various representation subspaces, we construct our attention mechanism by means of an efficient convolution operation. Moreover, to alleviate the fragility of our Transformer model against adversarial attacks, we attempt to learn smoother decision boundaries. To this end, we augment the shape information of an image in the high-level feature space by permuting the feature mean and variance within mini-batches. With less computational complexity, our proposed hybrid model demonstrates its high robustness and generalization ability compared to the state-of-the-art studies on a large-scale collection of standardized MedMNIST-2D datasets.

Related papers

Vision Transformer for Intracranial Hemorrhage Classification in CT Scans Using an Entropy-Aware Fuzzy Integral Strategy for Adaptive Scan-Level Decision Fusion [5.486205584465161]
Intracranial hemorrhage (ICH) is a critical medical emergency caused by the rupture of cerebral blood vessels, leading to internal bleeding within the skull. We propose an advanced pyramid vision transformer (PVT)-based model, leveraging its hierarchical attention mechanisms to capture both local and global spatial dependencies in brain CT scans.
arXiv Detail & Related papers (2025-03-11T16:47:32Z)
GS-TransUNet: Integrated 2D Gaussian Splatting and Transformer UNet for Accurate Skin Lesion Analysis [44.99833362998488]
We present a novel approach that combines 2D Gaussian splatting with the Transformer UNet architecture for automated skin cancer diagnosis. Our findings illustrate significant advancements in the precision of segmentation and classification. This integration sets new benchmarks in the field and highlights the potential for further research into multi-task medical image analysis methodologies.
arXiv Detail & Related papers (2025-02-23T23:28:47Z)
Multi-Scale Transformer Architecture for Accurate Medical Image Classification [4.578375402082224]
This study introduces an AI-driven skin lesion classification algorithm built on an enhanced Transformer architecture. By integrating a multi-scale feature fusion mechanism and refining the self-attention process, the model effectively extracts both global and local features. Performance evaluation on the ISIC 2017 dataset demonstrates that the improved Transformer surpasses established AI models.
arXiv Detail & Related papers (2025-02-10T08:22:25Z)
TransUNext: towards a more advanced U-shaped framework for automatic vessel segmentation in the fundus image [19.16680702780529]
We propose a more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext. The Global Multi-Scale Fusion (GMSF) module is further introduced to upgrade skip-connections, fuse high-level semantic and low-level detailed information, and eliminate high- and low-level semantic differences.
arXiv Detail & Related papers (2024-11-05T01:44:22Z)
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns [69.19631302047569]
We propose a unified MRI reconstruction model robust to various measurement undersampling patterns and image resolutions. Our model improves SSIM by 11% and PSNR by 4 dB over a state-of-the-art CNN (End-to-End VarNet) with 600$times$ faster inference than diffusion methods.
arXiv Detail & Related papers (2024-10-05T20:03:57Z)
TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation [6.013821375459473]
We introduce a novel deep learning architecture for medical image segmentation. Our proposed model shows consistent improvement over the state of the art on ten publicly available datasets.
arXiv Detail & Related papers (2024-09-05T09:14:03Z)
Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers. The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z)
CAF-YOLO: A Robust Framework for Multi-Scale Lesion Detection in Biomedical Imagery [0.0682074616451595]
CAF-YOLO is a nimble yet robust method for medical object detection that leverages the strengths of convolutional neural networks (CNNs) and transformers. ACFM module enhances the modeling of both global and local features, enabling the capture of long-term feature dependencies. MSNN improves multi-scale information aggregation by extracting features across diverse scales.
arXiv Detail & Related papers (2024-08-04T01:44:44Z)
L-SFAN: Lightweight Spatially-focused Attention Network for Pain Behavior Detection [44.016805074560295]
Chronic Low Back Pain (CLBP) afflicts millions globally, significantly impacting individuals' well-being and imposing economic burdens on healthcare systems. While artificial intelligence (AI) and deep learning offer promising avenues for analyzing pain-related behaviors to improve rehabilitation strategies, current models, including convolutional neural networks (CNNs), have limitations. We introduce hbox EmoL-SFAN, a lightweight CNN architecture incorporating 2D filters designed to capture the spatial-temporal interplay of data from motion capture and surface electromyography sensors.
arXiv Detail & Related papers (2024-06-07T12:01:37Z)
Harnessing The Power of Attention For Patch-Based Biomedical Image Classification [0.0]
We present a novel architecture based on self-attention mechanisms as an alternative to conventional CNNs. We introduce the Lancoz5 technique, which adapts variable image sizes to higher resolutions. Our methods address critical challenges faced by attention-based vision models, including inductive bias, weight sharing, receptive field limitations, and efficient data handling.
arXiv Detail & Related papers (2024-04-01T06:22:28Z)
Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions. We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z)
Breast Ultrasound Tumor Classification Using a Hybrid Multitask CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification. Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations. In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z)
Brain Imaging-to-Graph Generation using Adversarial Hierarchical Diffusion Models for MCI Causality Analysis [44.45598796591008]
Brain imaging-to-graph generation (BIGG) framework is proposed to map functional magnetic resonance imaging (fMRI) into effective connectivity for mild cognitive impairment analysis. The hierarchical transformers in the generator are designed to estimate the noise at multiple scales. Evaluations of the ADNI dataset demonstrate the feasibility and efficacy of the proposed model.
arXiv Detail & Related papers (2023-05-18T06:54:56Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection [122.4894940892536]
We present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
arXiv Detail & Related papers (2022-09-25T04:56:10Z)
Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation. GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z)
MSHT: Multi-stage Hybrid Transformer for the ROSE Image Analysis of Pancreatic Cancer [5.604939010661757]
Pancreatic cancer is one of the most malignant cancers in the world, which deteriorates rapidly with very high mortality. We propose a hybrid high-performance deep learning model to enable the automated workflow. A dataset of 4240 ROSE images is collected to evaluate the method in this unexplored field.
arXiv Detail & Related papers (2021-12-27T05:04:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.