Improved EATFormer: A Vision Transformer for Medical Image Classification
- URL: http://arxiv.org/abs/2403.13167v1
- Date: Tue, 19 Mar 2024 21:40:20 GMT
- Title: Improved EATFormer: A Vision Transformer for Medical Image Classification
- Authors: Yulong Shisu, Susano Mingwin, Yongshuai Wanwag, Zengqiang Chenso, Sunshin Huing,
- Abstract summary: This paper presents an improved Algorithm-based Transformer architecture for medical image classification using Vision Transformers.
The proposed EATFormer architecture combines the strengths of Convolutional Neural Networks and Vision Transformers.
Experimental results on the Chest X-ray and Kvasir datasets demonstrate that the proposed EATFormer significantly improves prediction speed and accuracy compared to baseline models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The accurate analysis of medical images is vital for diagnosing and predicting medical conditions. Traditional approaches relying on radiologists and clinicians suffer from inconsistencies and missed diagnoses. Computer-aided diagnosis systems can assist in achieving early, accurate, and efficient diagnoses. This paper presents an improved Evolutionary Algorithm-based Transformer architecture for medical image classification using Vision Transformers. The proposed EATFormer architecture combines the strengths of Convolutional Neural Networks and Vision Transformers, leveraging their ability to identify patterns in data and adapt to specific characteristics. The architecture incorporates novel components, including the Enhanced EA-based Transformer block with Feed-Forward Network, Global and Local Interaction , and Multi-Scale Region Aggregation modules. It also introduces the Modulated Deformable MSA module for dynamic modeling of irregular locations. The paper discusses the Vision Transformer (ViT) model's key features, such as patch-based processing, positional context incorporation, and Multi-Head Attention mechanism. It introduces the Multi-Scale Region Aggregation module, which aggregates information from different receptive fields to provide an inductive bias. The Global and Local Interaction module enhances the MSA-based global module by introducing a local path for extracting discriminative local information. Experimental results on the Chest X-ray and Kvasir datasets demonstrate that the proposed EATFormer significantly improves prediction speed and accuracy compared to baseline models.
Related papers
- CNN-Transformer Rectified Collaborative Learning for Medical Image Segmentation [60.08541107831459]
This paper proposes a CNN-Transformer rectified collaborative learning framework to learn stronger CNN-based and Transformer-based models for medical image segmentation.
Specifically, we propose a rectified logit-wise collaborative learning (RLCL) strategy which introduces the ground truth to adaptively select and rectify the wrong regions in student soft labels.
We also propose a class-aware feature-wise collaborative learning (CFCL) strategy to achieve effective knowledge transfer between CNN-based and Transformer-based models in the feature space.
arXiv Detail & Related papers (2024-08-25T01:27:35Z) - Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer [4.672688418357066]
We propose a novel Transformer Diffusion (DTS) model for robust segmentation in the presence of noise.
Our model, which analyzes the morphological representation of images, shows better results than the previous models in various medical imaging modalities.
arXiv Detail & Related papers (2024-08-01T07:35:54Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - HST-MRF: Heterogeneous Swin Transformer with Multi-Receptive Field for
Medical Image Segmentation [5.51045524851432]
We propose a Heterogeneous Swin Transformer with Multi-Receptive Field (HST-MRF) model for medical image segmentation.
The main purpose is to solve the problem of loss of structural information caused by patch segmentation using transformer.
Experimental results show that our proposed method outperforms state-of-the-art models and can achieve superior performance.
arXiv Detail & Related papers (2023-04-10T14:30:03Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z) - PHTrans: Parallelly Aggregating Global and Local Representations for
Medical Image Segmentation [7.140322699310487]
We propose a novel hybrid architecture for medical image segmentation called PHTrans.
PHTrans parallelly hybridizes Transformer and CNN in main building blocks to produce hierarchical representations from global and local features.
arXiv Detail & Related papers (2022-03-09T08:06:56Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - Domain Shift in Computer Vision models for MRI data analysis: An
Overview [64.69150970967524]
Machine learning and computer vision methods are showing good performance in medical imagery analysis.
Yet only a few applications are now in clinical use.
Poor transferability of themodels to data from different sources or acquisition domains is one of the reasons for that.
arXiv Detail & Related papers (2020-10-14T16:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.