A New Perspective to Boost Vision Transformer for Medical Image
Classification
- URL: http://arxiv.org/abs/2301.00989v1
- Date: Tue, 3 Jan 2023 07:45:59 GMT
- Title: A New Perspective to Boost Vision Transformer for Medical Image
Classification
- Authors: Yuexiang Li, Yawen Huang, Nanjun He, Kai Ma and Yefeng Zheng
- Abstract summary: We propose a self-supervised learning approach specifically for medical image classification with the Transformer backbone.
Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning.
The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
- Score: 33.215289791017064
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Transformer has achieved impressive successes for various computer vision
tasks. However, most of existing studies require to pretrain the Transformer
backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving
satisfactory performance, which is usually unavailable for medical images.
Additionally, due to the gap between medical and natural images, the
improvement generated by the ImageNet pretrained weights significantly degrades
while transferring the weights to medical image processing tasks. In this
paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised
learning approach specifically for medical image classification with the
Transformer backbone. Our BOLT consists of two networks, namely online and
target branches, for self-supervised representation learning. Concretely, the
online network is trained to predict the target network representation of the
same patch embedding tokens with a different perturbation. To maximally
excavate the impact of Transformer from limited medical data, we propose an
auxiliary difficulty ranking task. The Transformer is enforced to identify
which branch (i.e., online/target) is processing the more difficult perturbed
tokens. Overall, the Transformer endeavours itself to distill the
transformation-invariant features from the perturbed tokens to simultaneously
achieve difficulty measurement and maintain the consistency of self-supervised
representations. The proposed BOLT is evaluated on three medical image
processing tasks, i.e., skin lesion classification, knee fatigue fracture
grading and diabetic retinopathy grading. The experimental results validate the
superiority of our BOLT for medical image classification, compared to ImageNet
pretrained weights and state-of-the-art self-supervised learning approaches.
Related papers
- Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - Understanding Transfer Learning for Chest Radiograph Clinical Report
Generation with Modified Transformer Architectures [0.0]
We train a series of modified transformers to generate clinical reports from chest radiograph image input.
We use BLEU(1-4), ROUGE-L, CIDEr, and the clinical CheXbert F1 scores to validate our models and demonstrate competitive scores with state of the art models.
arXiv Detail & Related papers (2022-05-05T03:08:05Z) - Semi-Supervised Vision Transformers [76.83020291497895]
We study the training of Vision Transformers for semi-supervised image classification.
We find Vision Transformers perform poorly on a semi-supervised ImageNet setting.
CNNs achieve superior results in small labeled data regime.
arXiv Detail & Related papers (2021-11-22T09:28:13Z) - Transformer-Unet: Raw Image Processing with Unet [4.7944896477309555]
We propose Transformer-Unet by adding transformer modules in raw images instead of feature maps in Unet.
We form an end-to-end network and gain segmentation results better than many previous Unet based algorithms in our experiment.
arXiv Detail & Related papers (2021-09-17T09:03:10Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Pyramid Medical Transformer for Medical Image Segmentation [8.157373686645318]
We develop a novel method to integrate multi-scale attention and CNN feature extraction using a pyramidal network architecture, namely Pyramid Medical Transformer (PMTrans)
Experimental results on two medical image datasets, gland segmentation and MoNuSeg datasets, showed that PMTrans outperformed the latest CNN-based and transformer-based models for medical image segmentation.
arXiv Detail & Related papers (2021-04-29T23:57:20Z) - TransMed: Transformers Advance Multi-modal Medical Image Classification [4.500880052705654]
convolutional neural networks (CNN) have shown very competitive performance in medical image analysis tasks.
Transformers have been applied to computer vision and achieved remarkable success in large-scale datasets.
TransMed combines the advantages of CNN and transformer to efficiently extract low-level features of images.
arXiv Detail & Related papers (2021-03-10T08:57:53Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.