Efficient Scopeformer: Towards Scalable and Rich Feature Extraction for
Intracranial Hemorrhage Detection
- URL: http://arxiv.org/abs/2302.00220v1
- Date: Wed, 1 Feb 2023 03:51:27 GMT
- Title: Efficient Scopeformer: Towards Scalable and Rich Feature Extraction for
Intracranial Hemorrhage Detection
- Authors: Yassine Barhoumi, Nidhal C. Bouaynaya, Ghulam Rasool
- Abstract summary: "Scopeformer" is a novel multi-CNN-ViT model for intracranial hemorrhage classification in computed tomography (CT) images.
We propose effective feature projection methods to reduce redundancies among CNN-generated features and to control the input size of ViTs.
Experiments with various Scopeformer models show that the model performance is proportional to the number of convolutional blocks employed in the feature extractor.
- Score: 0.7734726150561088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The quality and richness of feature maps extracted by convolution neural
networks (CNNs) and vision Transformers (ViTs) directly relate to the robust
model performance. In medical computer vision, these information-rich features
are crucial for detecting rare cases within large datasets. This work presents
the "Scopeformer," a novel multi-CNN-ViT model for intracranial hemorrhage
classification in computed tomography (CT) images. The Scopeformer architecture
is scalable and modular, which allows utilizing various CNN architectures as
the backbone with diversified output features and pre-training strategies. We
propose effective feature projection methods to reduce redundancies among
CNN-generated features and to control the input size of ViTs. Extensive
experiments with various Scopeformer models show that the model performance is
proportional to the number of convolutional blocks employed in the feature
extractor. Using multiple strategies, including diversifying the pre-training
paradigms for CNNs, different pre-training datasets, and style transfer
techniques, we demonstrate an overall improvement in the model performance at
various computational budgets. Later, we propose smaller compute-efficient
Scopeformer versions with three different types of input and output ViT
configurations. Efficient Scopeformers use four different pre-trained CNN
architectures as feature extractors to increase feature richness. Our best
Efficient Scopeformer model achieved an accuracy of 96.94\% and a weighted
logarithmic loss of 0.083 with an eight times reduction in the number of
trainable parameters compared to the base Scopeformer. Another version of the
Efficient Scopeformer model further reduced the parameter space by almost 17
times with negligible performance reduction. Hybrid CNNs and ViTs might provide
the desired feature richness for developing accurate medical computer vision
models
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Variational autoencoder-based neural network model compression [4.992476489874941]
Variational Autoencoders (VAEs), as a form of deep generative model, have been widely used in recent years.
This paper aims to explore neural network model compression method based on VAE.
arXiv Detail & Related papers (2024-08-25T09:06:22Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Advancing Plain Vision Transformer Towards Remote Sensing Foundation
Model [97.9548609175831]
We resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models customized for remote sensing tasks.
Specifically, to handle the large image size and objects of various orientations in RS images, we propose a new rotated varied-size window attention.
Experiments on detection tasks demonstrate the superiority of our model over all state-of-the-art models, achieving 81.16% mAP on the DOTA-V1.0 dataset.
arXiv Detail & Related papers (2022-08-08T09:08:40Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - HyperTransformer: Model Generation for Supervised and Semi-Supervised
Few-Shot Learning [14.412066456583917]
We propose a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples.
Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal.
We extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
arXiv Detail & Related papers (2022-01-11T20:15:35Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Scopeformer: n-CNN-ViT Hybrid Model for Intracranial Hemorrhage
Classification [0.0]
We propose a feature generator composed of an ensemble of convolutional neuralnetworks (CNNs) to improve the Vision Transformer (ViT) models.
We show that by gradually stacking several feature maps extracted using multiple Xception CNNs, we can develop a feature-rich input for the ViT model.
arXiv Detail & Related papers (2021-07-07T20:20:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.