Video Capsule Endoscopy Classification using Focal Modulation Guided
Convolutional Neural Network
- URL: http://arxiv.org/abs/2206.08298v1
- Date: Thu, 16 Jun 2022 16:57:45 GMT
- Title: Video Capsule Endoscopy Classification using Focal Modulation Guided
Convolutional Neural Network
- Authors: Abhishek Srivastava, Nikhil Kumar Tomar, Ulas Bagci, Debesh Jha
- Abstract summary: We propose FocalConvNet, a focal modulation network integrated with lightweight convolutional layers for the classification of small bowel anatomical landmarks and luminal findings.
We report the highest throughput of 148.02 images/second rate to establish the potential of FocalConvNet in a real-time clinical environment.
- Score: 3.1374864575817214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video capsule endoscopy is a hot topic in computer vision and medicine. Deep
learning can have a positive impact on the future of video capsule endoscopy
technology. It can improve the anomaly detection rate, reduce physicians' time
for screening, and aid in real-world clinical analysis. CADx classification
system for video capsule endoscopy has shown a great promise for further
improvement. For example, detection of cancerous polyp and bleeding can lead to
swift medical response and improve the survival rate of the patients. To this
end, an automated CADx system must have high throughput and decent accuracy. In
this paper, we propose FocalConvNet, a focal modulation network integrated with
lightweight convolutional layers for the classification of small bowel
anatomical landmarks and luminal findings. FocalConvNet leverages focal
modulation to attain global context and allows global-local spatial
interactions throughout the forward pass. Moreover, the convolutional block
with its intrinsic inductive/learning bias and capacity to extract hierarchical
features allows our FocalConvNet to achieve favourable results with high
throughput. We compare our FocalConvNet with other SOTA on Kvasir-Capsule, a
large-scale VCE dataset with 44,228 frames with 13 classes of different
anomalies. Our proposed method achieves the weighted F1-score, recall and MCC}
of 0.6734, 0.6373 and 0.2974, respectively outperforming other SOTA
methodologies. Furthermore, we report the highest throughput of 148.02
images/second rate to establish the potential of FocalConvNet in a real-time
clinical environment. The code of the proposed FocalConvNet is available at
https://github.com/NoviceMAn-prog/FocalConvNet.
Related papers
- Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography [39.58317527488534]
This study compares multimodal and CNN-based methods for automated classification using the BI-RADS system.<n>Zero-shot classification achieved modest performance, while the fine-tuned ConvNeXt model outperformed the BioMedCLIP linear probe.<n>These findings suggest that despite the promise of multimodal learning, CNN-based models with end-to-end fine-tuning provide stronger performance for specialized medical imaging.
arXiv Detail & Related papers (2025-06-16T20:14:37Z) - KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation [46.57880203321858]
We propose a novel network (KaLDeX) for vascular segmentation leveraging a Kalman filter based linear deformable cross attention (LDCA) module.
Our approach is based on two key components: Kalman filter (KF) based linear deformable convolution (LD) and cross-attention (CA) modules.
The proposed method is evaluated on retinal fundus image datasets (DRIVE, CHASE_BD1, and STARE) as well as the 3mm and 6mm of the OCTA-500 dataset.
arXiv Detail & Related papers (2024-10-28T16:00:42Z) - CAVE-Net: Classifying Abnormalities in Video Capsule Endoscopy [0.1937002985471497]
We propose an ensemble-based approach to improve diagnostic accuracy in analyzing complex image datasets.
We leverage the unique feature extraction capabilities of each model to enhance the overall accuracy.
By using these methods, the proposed framework, CAVE-Net, provides robust feature discrimination and improved classification results.
arXiv Detail & Related papers (2024-10-26T17:25:08Z) - Multi-Class Abnormality Classification in Video Capsule Endoscopy Using Deep Learning [0.0]
This report outlines Team Seq2Cure's deep learning approach for the Capsule Vision 2024 Challenge.
We leverage an ensemble of convolutional neural networks (CNNs) and transformer-based architectures for multi-class abnormality classification.
Our approach achieved a balanced accuracy of 86.34 percent and a mean AUC-ROC score of 0.9908 on the validation set.
arXiv Detail & Related papers (2024-10-24T16:13:06Z) - Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas.
This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z) - Classification of Endoscopy and Video Capsule Images using CNN-Transformer Model [1.0994755279455526]
This study proposes a hybrid model that combines the advantages of Transformers and Convolutional Neural Networks (CNNs) to enhance classification performance.
For the GastroVision dataset, our proposed model demonstrates excellent performance with Precision, Recall, F1 score, Accuracy, and Matthews Correlation Coefficient (MCC) of 0.8320, 0.8386, 0.8324, 0.8386, and 0.8191, respectively.
arXiv Detail & Related papers (2024-08-20T11:05:32Z) - Automatic diagnosis of knee osteoarthritis severity using Swin
transformer [55.01037422579516]
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint.
We propose an automated approach that employs the Swin Transformer to predict the severity of KOA.
arXiv Detail & Related papers (2023-07-10T09:49:30Z) - Medulloblastoma Tumor Classification using Deep Transfer Learning with
Multi-Scale EfficientNets [63.62764375279861]
We propose an end-to-end MB tumor classification and explore transfer learning with various input sizes and matching network dimensions.
Using a data set with 161 cases, we demonstrate that pre-trained EfficientNets with larger input resolutions lead to significant performance improvements.
arXiv Detail & Related papers (2021-09-10T13:07:11Z) - COVID-Net US: A Tailored, Highly Efficient, Self-Attention Deep
Convolutional Neural Network Design for Detection of COVID-19 Patient Cases
from Point-of-care Ultrasound Imaging [101.27276001592101]
We introduce COVID-Net US, a highly efficient, self-attention deep convolutional neural network design tailored for COVID-19 screening from lung POCUS images.
Experimental results show that the proposed COVID-Net US can achieve an AUC of over 0.98 while achieving 353X lower architectural complexity, 62X lower computational complexity, and 14.3X faster inference times on a Raspberry Pi.
To advocate affordable healthcare and artificial intelligence for resource-constrained environments, we have made COVID-Net US open source and publicly available as part of the COVID-Net open source initiative.
arXiv Detail & Related papers (2021-08-05T16:47:33Z) - Acute Lymphoblastic Leukemia Detection from Microscopic Images Using
Weighted Ensemble of Convolutional Neural Networks [4.095759108304108]
This article has automated the ALL detection task from microscopic cell images, employing deep Convolutional Neural Networks (CNNs)
Various data augmentations and pre-processing are incorporated for achieving a better generalization of the network.
Our proposed weighted ensemble model, using the kappa values of the ensemble candidates as their weights, has outputted a weighted F1-score of 88.6 %, a balanced accuracy of 86.2 %, and an AUC of 0.941 in the preliminary test set.
arXiv Detail & Related papers (2021-05-09T18:58:48Z) - Intrapapillary Capillary Loop Classification in Magnification Endoscopy:
Open Dataset and Baseline Methodology [8.334256673330879]
We build a computer-assisted detection system that can classify still images or video frames as normal or abnormal.
We present a new benchmark dataset containing 68K binary labeled frames extracted from 114 patient videos.
The proposed method achieved an average accuracy of 91.7 % compared to the 94.7 % achieved by a group of 12 senior clinicians.
arXiv Detail & Related papers (2021-02-19T14:55:21Z) - DDANet: Dual Decoder Attention Network for Automatic Polyp Segmentation [0.3734402152170273]
We propose a novel architecture called DDANet'' based on a dual decoder attention network.
Experiments demonstrate that the model trained on the Kvasir-SEG dataset and tested on an unseen dataset achieves a dice coefficient of 0.7874, mIoU of 0.7010, recall of 0.7987, and a precision of 0.8577.
arXiv Detail & Related papers (2020-12-30T17:52:35Z) - Comparisons among different stochastic selection of activation layers
for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks.
We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.