A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR
- URL: http://arxiv.org/abs/2406.08046v1
- Date: Wed, 12 Jun 2024 09:58:42 GMT
- Title: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR
- Authors: Sasidhar Alavala, Anil Kumar Vadde, Aparnamala Kancheti, Subrahmanyam Gorthi,
- Abstract summary: Solution combines the Swin Transformer for the initial classification of bleeding frames and RT-DETR for further detection of bleeding.
On the validation set, this approach achieves a classification accuracy of 98.5% compared to 91.7% without any pre-processing.
On the test set, this approach achieves a classification accuracy and F1 score of 87.0% and 89.0% respectively.
- Score: 1.7499351967216343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present our approach to the Auto WCEBleedGen Challenge V2 2024. Our solution combines the Swin Transformer for the initial classification of bleeding frames and RT-DETR for further detection of bleeding in Wireless Capsule Endoscopy (WCE), enhanced by a series of image preprocessing steps. These steps include converting images to Lab colour space, applying Contrast Limited Adaptive Histogram Equalization (CLAHE) for better contrast, and using Gaussian blur to suppress artefacts. The Swin Transformer utilizes a tiered architecture with shifted windows to efficiently manage self-attention calculations, focusing on local windows while enabling cross-window interactions. RT-DETR features an efficient hybrid encoder for fast processing of multi-scale features and an uncertainty-minimal query selection for enhanced accuracy. The class activation maps by Ablation-CAM are plausible to the model's decisions. On the validation set, this approach achieves a classification accuracy of 98.5% (best among the other state-of-the-art models) compared to 91.7% without any pre-processing and an $\text{AP}_{50}$ of 66.7% compared to 65.0% with state-of-the-art YOLOv8. On the test set, this approach achieves a classification accuracy and F1 score of 87.0% and 89.0% respectively.
Related papers
- Enhancing DR Classification with Swin Transformer and Shifted Window Attention [9.99302279736049]
Diabetic retinopathy (DR) is a leading cause of blindness worldwide, underscoring the importance of early detection for effective treatment.
We propose a robust preprocessing pipeline incorporating image cropping, Contrast-Limited Adaptive Histogram Equalization (CLAHE), and targeted data augmentation to improve model generalization and resilience.
We validate our method on the Aptos and IDRiD datasets for multi-class DR classification, achieving accuracy rates of 89.65% and 97.40%, respectively.
arXiv Detail & Related papers (2025-04-20T13:23:20Z) - CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification [65.46685389276443]
We ground our work on CLIP, a vision-language pre-trained encoder model that can perform zero-shot classification by matching an image with text prompts.
We then formulate purification risk as the KL divergence between the joint distributions purification process.
We propose two variants for our CLIPure approach: CLI-Diff which models the likelihood of images' latent vectors, and CLIPure-Cos which models the likelihood with the cosine similarity between the embeddings of an image and a photo of a.''
arXiv Detail & Related papers (2025-02-25T13:09:34Z) - Transformer-Based Wireless Capsule Endoscopy Bleeding Tissue Detection and Classification [0.562479170374811]
We design an end-to-end trainable model for the automatic detection and classification of bleeding and non-bleeding frames.
Based on the DETR model, our model uses the Resnet50 for feature extraction, the transformer encoder-decoder for bleeding and non-bleeding region detection, and a feedforward neural network for classification.
Trained in an end-to-end approach on the Auto-WCEBleedGen Version 1 challenge training set, our model performs both detection and classification tasks as a single unit.
arXiv Detail & Related papers (2024-12-26T13:49:39Z) - Automated Bleeding Detection and Classification in Wireless Capsule Endoscopy with YOLOv8-X [2.6374023322018916]
This paper presents our solution to the Auto-WCEBleedGen Version V1 Challenge.
We developed a unified YOLOv8-X model for both detection and classification of bleeding regions.
Our approach achieved 96.10% classification accuracy and 76.8% mean Average Precision (mAP) at 0.5 IoU on the val idation dataset.
arXiv Detail & Related papers (2024-12-21T13:37:11Z) - Capsule Endoscopy Multi-classification via Gated Attention and Wavelet Transformations [1.5146068448101746]
Abnormalities in the gastrointestinal tract significantly influence the patient's health and require a timely diagnosis.
The work presents the process of developing and evaluating a novel model designed to classify gastrointestinal anomalies from a video frame.
integration of Omni Dimensional Gated Attention (OGA) mechanism and Wavelet transformation techniques into the model's architecture allowed the model to focus on the most critical areas.
The model's performance is benchmarked against two base models, VGG16 and ResNet50, demonstrating its enhanced ability to identify and classify a range of gastrointestinal abnormalities accurately.
arXiv Detail & Related papers (2024-10-25T08:01:35Z) - Breast Ultrasound Tumor Classification Using a Hybrid Multitask
CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification.
Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations.
In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z) - Comparison of retinal regions-of-interest imaged by OCT for the
classification of intermediate AMD [3.0171643773711208]
A total of 15744 B-scans from 269 intermediate AMD patients and 115 normal subjects were used in this study.
For each subset, a convolutional neural network (based on VGG16 architecture and pre-trained on ImageNet) was trained and tested.
The performance of the models was evaluated using the area under the receiver operating characteristic (AUROC), accuracy, sensitivity, and specificity.
arXiv Detail & Related papers (2023-05-04T13:48:55Z) - Attention-based Saliency Maps Improve Interpretability of Pneumothorax
Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency.
ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData.
ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z) - Enhanced Sharp-GAN For Histopathology Image Synthesis [63.845552349914186]
Histopathology image synthesis aims to address the data shortage issue in training deep learning approaches for accurate cancer detection.
We propose a novel approach that enhances the quality of synthetic images by using nuclei topology and contour regularization.
The proposed approach outperforms Sharp-GAN in all four image quality metrics on two datasets.
arXiv Detail & Related papers (2023-01-24T17:54:01Z) - Hybrid guiding: A multi-resolution refinement approach for semantic
segmentation of gigapixel histopathological images [0.7490318169877296]
We propose a cascaded convolutional neural network design, called H2G-Net, for semantic segmentation.
Design involves a detection stage using a patch-wise method, and a refinement stage using a convolutional autoencoder.
Best design achieved a Dice score of 0.933 on an independent test set of 90 WSIs.
arXiv Detail & Related papers (2021-12-07T02:31:29Z) - The Report on China-Spain Joint Clinical Testing for Rapid COVID-19 Risk
Screening by Eye-region Manifestations [59.48245489413308]
We developed and tested a COVID-19 rapid prescreening model using the eye-region images captured in China and Spain with cellphone cameras.
The performance was measured using area under receiver-operating-characteristic curve (AUC), sensitivity, specificity, accuracy, and F1.
arXiv Detail & Related papers (2021-09-18T02:28:01Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - Exploring the Effect of Image Enhancement Techniques on COVID-19
Detection using Chest X-rays Images [4.457871213347773]
This paper explores the effect of various popular image enhancement techniques and states the effect of each of them on the detection performance.
We have compiled the largest X-ray dataset called COVQU-20, consisting of 18,479 normal, non-COVID lung opacity and COVID-19 CXR images.
The accuracy, precision, sensitivity, f1-score, and specificity in the detection of COVID-19 with gamma correction on CXR images were 96.29%, 96.28%, 96.29%, 96.28% and 96.27% respectively.
arXiv Detail & Related papers (2020-11-25T20:58:27Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z) - COVIDLite: A depth-wise separable deep neural network with white balance
and CLAHE for detection of COVID-19 [1.1139113832077312]
COVIDLite is a combination of white balance followed by Contrast Limited Adaptive Histogram Equalization ( CLAHE) and depth-wise separable convolutional neural network (DSCNN)
The proposed COVIDLite method resulted in improved performance in comparison to vanilla DSCNN with no pre-processing.
The proposed method achieved higher accuracy of 99.58% for binary classification, whereas 96.43% for multiclass classification and out-performed various state-of-the-art methods.
arXiv Detail & Related papers (2020-06-19T02:30:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.