A comparative study between vision transformers and CNNs in digital
pathology
- URL: http://arxiv.org/abs/2206.00389v1
- Date: Wed, 1 Jun 2022 10:41:11 GMT
- Title: A comparative study between vision transformers and CNNs in digital
pathology
- Authors: Luca Deininger, Bernhard Stimpel, Anil Yuce, Samaneh
Abbasi-Sureshjani, Simon Sch\"onenberger, Paolo Ocampo, Konstanty Korski,
Fabien Gaire
- Abstract summary: This work explores vision transformers for tumor detection in digital pathology whole slide images in four tissue types.
We compared the vision transformer DeiT-Tiny to the state-of-the-art convolutional neural network ResNet18.
The results show that the vision transformer performed slightly better than the ResNet18 for three of four tissue types for tumor detection while the ResNet18 performed slightly better for the remaining tasks.
- Score: 1.71601014035428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, vision transformers were shown to be capable of outperforming
convolutional neural networks when pretrained on sufficient amounts of data. In
comparison to convolutional neural networks, vision transformers have a weaker
inductive bias and therefore allow a more flexible feature detection. Due to
their promising feature detection, this work explores vision transformers for
tumor detection in digital pathology whole slide images in four tissue types,
and for tissue type identification. We compared the patch-wise classification
performance of the vision transformer DeiT-Tiny to the state-of-the-art
convolutional neural network ResNet18. Due to the sparse availability of
annotated whole slide images, we further compared both models pretrained on
large amounts of unlabeled whole-slide images using state-of-the-art
self-supervised approaches. The results show that the vision transformer
performed slightly better than the ResNet18 for three of four tissue types for
tumor detection while the ResNet18 performed slightly better for the remaining
tasks. The aggregated predictions of both models on slide level were
correlated, indicating that the models captured similar imaging features. All
together, the vision transformer models performed on par with the ResNet18
while requiring more effort to train. In order to surpass the performance of
convolutional neural networks, vision transformers might require more
challenging tasks to benefit from their weak inductive bias.
Related papers
- CNN and ViT Efficiency Study on Tiny ImageNet and DermaMNIST Datasets [0.0]
We introduce a fine-tuning strategy applied to four Vision Transformer variants (Tiny, Small, Base, Large) on DermatologyMNIST and TinyImageNet.<n>We demonstrate that appropriately fine-tuned Vision Transformers can match or exceed the baseline's performance, achieve faster inference, and operate with fewer parameters.
arXiv Detail & Related papers (2025-05-13T06:17:18Z) - Artificial intelligence application in lymphoma diagnosis: from Convolutional Neural Network to Vision Transformer [34.04248949660201]
We compare the classification performance of vision transformer to our previously designed convolutional neural network on the same dataset.
To the best of the authors' knowledge, this is the first direct comparison of predictive performance between a vision transformer model and a convolutional neural network model.
arXiv Detail & Related papers (2025-04-05T02:33:34Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Image Deblurring by Exploring In-depth Properties of Transformer [86.7039249037193]
We leverage deep features extracted from a pretrained vision transformer (ViT) to encourage recovered images to be sharp without sacrificing the performance measured by the quantitative metrics.
By comparing the transformer features between recovered image and target one, the pretrained transformer provides high-resolution blur-sensitive semantic information.
One regards the features as vectors and computes the discrepancy between representations extracted from recovered image and target one in Euclidean space.
arXiv Detail & Related papers (2023-03-24T14:14:25Z) - View-Disentangled Transformer for Brain Lesion Detection [50.4918615815066]
We propose a novel view-disentangled transformer to enhance the extraction of MRI features for more accurate tumour detection.
First, the proposed transformer harvests long-range correlation among different positions in a 3D brain scan.
Second, the transformer models a stack of slice features as multiple 2D views and enhance these features view-by-view.
Third, we deploy the proposed transformer module in a transformer backbone, which can effectively detect the 2D regions surrounding brain lesions.
arXiv Detail & Related papers (2022-09-20T11:58:23Z) - Self-Supervised Vision Transformers Learn Visual Concepts in
Histopathology [5.164102666113966]
We conduct a search for good representations in pathology by training a variety of self-supervised models with validation on a variety of weakly-supervised and patch-level tasks.
Our key finding is in discovering that Vision Transformers using DINO-based knowledge distillation are able to learn data-efficient and interpretable features in histology images.
arXiv Detail & Related papers (2022-03-01T16:14:41Z) - Feature-level augmentation to improve robustness of deep neural networks
to affine transformations [22.323625542814284]
Recent studies revealed that convolutional neural networks do not generalize well to small image transformations.
We propose to introduce data augmentation at intermediate layers of the neural architecture.
We develop the capacity of the neural network to cope with such transformations.
arXiv Detail & Related papers (2022-02-10T17:14:58Z) - Class-Aware Generative Adversarial Transformers for Medical Image
Segmentation [39.14169989603906]
We present CA-GANformer, a novel type of generative adversarial transformers, for medical image segmentation.
First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations.
We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures.
arXiv Detail & Related papers (2022-01-26T03:50:02Z) - AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [78.07924262215181]
We introduce AdaViT, an adaptive framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use.
Our method obtains more than 2x improvement on efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy.
arXiv Detail & Related papers (2021-11-30T18:57:02Z) - Evaluating Transformer based Semantic Segmentation Networks for
Pathological Image Segmentation [2.7029872968576947]
Histopathology has played an essential role in cancer diagnosis.
Various CNN-based automated pathological image segmentation approaches have been developed in computer-assisted pathological image analysis.
Transformer neural networks (Transformer) have shown the unique merit of capturing the global long distance dependencies across the entire image as a new deep learning paradigm.
arXiv Detail & Related papers (2021-08-26T18:46:43Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - Understanding Robustness of Transformers for Image Classification [34.51672491103555]
Vision Transformer (ViT) has surpassed ResNets for image classification.
Details of the Transformer architecture lead one to wonder whether these networks are as robust.
We find that ViT models are at least as robust as the ResNet counterparts on a broad range of perturbations.
arXiv Detail & Related papers (2021-03-26T16:47:55Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.