Scopeformer: n-CNN-ViT Hybrid Model for Intracranial Hemorrhage
Classification
- URL: http://arxiv.org/abs/2107.04575v1
- Date: Wed, 7 Jul 2021 20:20:24 GMT
- Title: Scopeformer: n-CNN-ViT Hybrid Model for Intracranial Hemorrhage
Classification
- Authors: Yassine Barhoumi, Rasool Ghulam
- Abstract summary: We propose a feature generator composed of an ensemble of convolutional neuralnetworks (CNNs) to improve the Vision Transformer (ViT) models.
We show that by gradually stacking several feature maps extracted using multiple Xception CNNs, we can develop a feature-rich input for the ViT model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a feature generator backbone composed of an ensemble of
convolutional neuralnetworks (CNNs) to improve the recently emerging Vision
Transformer (ViT) models. We tackled the RSNA intracranial hemorrhage
classification problem, i.e., identifying various hemorrhage types from
computed tomography (CT) slices. We show that by gradually stacking several
feature maps extracted using multiple Xception CNNs, we can develop a
feature-rich input for the ViT model. Our approach allowed the ViT model to pay
attention to relevant features at multiple levels. Moreover, pretraining the n
CNNs using various paradigms leads to a diverse feature set and further
improves the performance of the proposed n-CNN-ViT. We achieved a test accuracy
of 98.04% with a weighted logarithmic loss value of 0.0708. The proposed
architecture is modular and scalable in both the number of CNNs used for
feature extraction and the size of the ViT.
Related papers
- TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation [6.013821375459473]
We introduce a novel deep learning architecture for medical image segmentation.
Our proposed model shows consistent improvement over the state of the art on ten publicly available datasets.
arXiv Detail & Related papers (2024-09-05T09:14:03Z) - Channel Boosted CNN-Transformer-based Multi-Level and Multi-Scale Nuclei Segmentation [0.40964539027092917]
nuclei segmentation is an essential foundation for various applications in computational pathology, including cancer diagnosis and treatment planning.
achieving accurate segmentation remains challenging due to factors like clustered nuclei, high intra-class variability in size and shape, resemblance to other cells, and color or contrast variations between nuclei and background.
We propose two CNN-Transformer architectures that leverage the strengths of both CNNs and Transformers to effectively learn nuclei boundaries in multi-organ histology images.
arXiv Detail & Related papers (2024-07-27T05:54:05Z) - A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases [0.0]
Vision Transformers (ViT) are powerful tools due to their scalability and ability to process large amounts of data.
We fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset.
Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases.
arXiv Detail & Related papers (2024-05-31T23:56:42Z) - Unveiling the Unseen: Identifiable Clusters in Trained Depthwise
Convolutional Kernels [56.69755544814834]
Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures.
This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers.
arXiv Detail & Related papers (2024-01-25T19:05:53Z) - Continuous time recurrent neural networks: overview and application to
forecasting blood glucose in the intensive care unit [56.801856519460465]
Continuous time autoregressive recurrent neural networks (CTRNNs) are a deep learning model that account for irregular observations.
We demonstrate the application of these models to probabilistic forecasting of blood glucose in a critical care setting.
arXiv Detail & Related papers (2023-04-14T09:39:06Z) - Efficient Scopeformer: Towards Scalable and Rich Feature Extraction for
Intracranial Hemorrhage Detection [0.7734726150561088]
"Scopeformer" is a novel multi-CNN-ViT model for intracranial hemorrhage classification in computed tomography (CT) images.
We propose effective feature projection methods to reduce redundancies among CNN-generated features and to control the input size of ViTs.
Experiments with various Scopeformer models show that the model performance is proportional to the number of convolutional blocks employed in the feature extractor.
arXiv Detail & Related papers (2023-02-01T03:51:27Z) - Bridging the Gap Between Vision Transformers and Convolutional Neural
Networks on Small Datasets [91.25055890980084]
There still remains an extreme performance gap between Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) when training from scratch on small datasets.
We propose Dynamic Hybrid Vision Transformer (DHVT) as the solution to enhance the two inductive biases.
Our DHVT achieves a series of state-of-the-art performance with a lightweight model, 85.68% on CIFAR-100 with 22.8M parameters, 82.3% on ImageNet-1K with 24.0M parameters.
arXiv Detail & Related papers (2022-10-12T06:54:39Z) - Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS
Instance Segmentation [11.575821326313607]
We propose Video-TransUNet, a deep architecture for segmentation in medical CT videos constructed by integrating temporal feature blending into the TransUNet deep learning framework.
In particular, our approach amalgamates strong frame representation via a ResNet CNN backbone, multi-frame feature blending via a Temporal Context Module, and reconstructive capabilities for multiple targets via a UNet-based convolutional-deconal architecture with multiple heads.
arXiv Detail & Related papers (2022-08-17T14:28:58Z) - Data-Efficient Vision Transformers for Multi-Label Disease
Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images.
ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present.
Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z) - Medulloblastoma Tumor Classification using Deep Transfer Learning with
Multi-Scale EfficientNets [63.62764375279861]
We propose an end-to-end MB tumor classification and explore transfer learning with various input sizes and matching network dimensions.
Using a data set with 161 cases, we demonstrate that pre-trained EfficientNets with larger input resolutions lead to significant performance improvements.
arXiv Detail & Related papers (2021-09-10T13:07:11Z) - Exploring Deep Hybrid Tensor-to-Vector Network Architectures for
Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size.
CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.