Related papers: Vision Transformers for Small Histological Datasets Learned through Knowledge Distillation

Vision Transformers for Small Histological Datasets Learned through Knowledge Distillation

URL: http://arxiv.org/abs/2305.17370v1
Date: Sat, 27 May 2023 05:09:03 GMT
Title: Vision Transformers for Small Histological Datasets Learned through Knowledge Distillation
Authors: Neel Kanwal and Trygve Eftestol and Farbod Khoraminia and Tahlita CM Zuiverloon and Kjersti Engan
Abstract summary: Vision Transformers (ViTs) may detect and exclude artifacts before running the diagnostic algorithm. A simple way to develop robust and generalized ViTs is to train them on massive datasets. We present a student-teacher recipe to improve the classification performance of ViT for the air bubbles detection task.
Score: 1.4724454726700604
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Computational Pathology (CPATH) systems have the potential to automate diagnostic tasks. However, the artifacts on the digitized histological glass slides, known as Whole Slide Images (WSIs), may hamper the overall performance of CPATH systems. Deep Learning (DL) models such as Vision Transformers (ViTs) may detect and exclude artifacts before running the diagnostic algorithm. A simple way to develop robust and generalized ViTs is to train them on massive datasets. Unfortunately, acquiring large medical datasets is expensive and inconvenient, prompting the need for a generalized artifact detection method for WSIs. In this paper, we present a student-teacher recipe to improve the classification performance of ViT for the air bubbles detection task. ViT, trained under the student-teacher framework, boosts its performance by distilling existing knowledge from the high-capacity teacher model. Our best-performing ViT yields 0.961 and 0.911 F1-score and MCC, respectively, observing a 7% gain in MCC against stand-alone training. The proposed method presents a new perspective of leveraging knowledge distillation over transfer learning to encourage the use of customized transformers for efficient preprocessing pipelines in the CPATH systems.

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Optimizing Vision Transformers with Data-Free Knowledge Transfer [8.323741354066474]
Vision transformers (ViTs) have excelled in various computer vision tasks due to their superior ability to capture long-distance dependencies. We propose compressing large ViT models using Knowledge Distillation (KD), which is implemented data-free to circumvent limitations related to data availability.
arXiv Detail & Related papers (2024-08-12T07:03:35Z)
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training [51.622652121580394]
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the textitextremely simple lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm. Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4%$/$78.9%$ top-1 accuracy on ImageNet-1
arXiv Detail & Related papers (2024-04-18T14:14:44Z)
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets [30.178427266135756]
Vision Transformer (ViT) has emerged as a prominent architecture for various computer vision tasks. ViT requires a large amount of data for pre-training. We introduce DeiT-LT to tackle the problem of training ViTs from scratch on long-tailed datasets.
arXiv Detail & Related papers (2024-04-03T17:58:21Z)
TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation [6.856317526681759]
Visual place recognition plays a pivotal role in autonomous exploration and navigation of mobile robots. Existing methods overcome this by exploiting powerful yet large networks. We propose a high-performance teacher and lightweight student distillation framework called TSCM.
arXiv Detail & Related papers (2024-04-02T02:29:41Z)
Equipping Computational Pathology Systems with Artifact Processing Pipelines: A Showcase for Computation and Performance Trade-offs [0.7226586370054761]
We propose a mixture of experts (MoE) scheme for detecting five notable artifacts, including damaged tissue, blur, folded tissue, air bubbles, and histologically irrelevant blood. We developed DL pipelines using two MoEs and two multiclass models of state-of-the-art deep convolutional neural networks (DCNNs) and vision transformers (ViTs) The proposed MoE yields 86.15% F1 and 97.93% sensitivity scores on unseen data, retaining less computational cost for inference than MoE using ViTs.
arXiv Detail & Related papers (2024-03-12T15:22:05Z)
Distilling Knowledge from CNN-Transformer Models for Enhanced Human Action Recognition [1.8722948221596285]
The research aims to enhance the performance and efficiency of smaller student models by transferring knowledge from larger teacher models. The proposed method employs a Transformer vision network as the student model, while a convolutional network serves as the teacher model. The Vision Transformer (ViT) architecture is introduced as a robust framework for capturing global dependencies in images.
arXiv Detail & Related papers (2023-11-02T14:57:58Z)
Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm. FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z)
Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student. To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z)
Generic-to-Specific Distillation of Masked Autoencoders [119.21281960831651]
We propose generic-to-specific distillation (G2SD) to tap the potential of small ViT models under the supervision of large models pre-trained by masked autoencoders. With G2SD, the vanilla ViT-Small model achieves 98.7%, 98.1% and 99.3% the performance of its teacher for image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T17:13:14Z)
Improving Vision Transformers by Revisiting High-frequency Components [106.7140968644414]
We show that Vision Transformer (ViT) models are less effective in capturing the high-frequency components of images than CNN models. To compensate, we propose HAT, which directly augments high-frequency components of images via adversarial training. We show that HAT can consistently boost the performance of various ViT models.
arXiv Detail & Related papers (2022-04-03T05:16:51Z)
Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive. We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN. Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z)
Efficient Vision Transformers via Fine-Grained Manifold Distillation [96.50513363752836]
Vision transformer architectures have shown extraordinary performance on many computer vision tasks. Although the network performance is boosted, transformers are often required more computational resources. We propose to excavate useful information from the teacher transformer through the relationship between images and the divided patches.
arXiv Detail & Related papers (2021-07-03T08:28:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.