A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR
- URL: http://arxiv.org/abs/2406.08046v1
- Date: Wed, 12 Jun 2024 09:58:42 GMT
- Title: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR
- Authors: Sasidhar Alavala, Anil Kumar Vadde, Aparnamala Kancheti, Subrahmanyam Gorthi,
- Abstract summary: Solution combines the Swin Transformer for the initial classification of bleeding frames and RT-DETR for further detection of bleeding.
On the validation set, this approach achieves a classification accuracy of 98.5% compared to 91.7% without any pre-processing.
On the test set, this approach achieves a classification accuracy and F1 score of 87.0% and 89.0% respectively.
- Score: 1.7499351967216343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present our approach to the Auto WCEBleedGen Challenge V2 2024. Our solution combines the Swin Transformer for the initial classification of bleeding frames and RT-DETR for further detection of bleeding in Wireless Capsule Endoscopy (WCE), enhanced by a series of image preprocessing steps. These steps include converting images to Lab colour space, applying Contrast Limited Adaptive Histogram Equalization (CLAHE) for better contrast, and using Gaussian blur to suppress artefacts. The Swin Transformer utilizes a tiered architecture with shifted windows to efficiently manage self-attention calculations, focusing on local windows while enabling cross-window interactions. RT-DETR features an efficient hybrid encoder for fast processing of multi-scale features and an uncertainty-minimal query selection for enhanced accuracy. The class activation maps by Ablation-CAM are plausible to the model's decisions. On the validation set, this approach achieves a classification accuracy of 98.5% (best among the other state-of-the-art models) compared to 91.7% without any pre-processing and an $\text{AP}_{50}$ of 66.7% compared to 65.0% with state-of-the-art YOLOv8. On the test set, this approach achieves a classification accuracy and F1 score of 87.0% and 89.0% respectively.
Related papers
- Transformer-Based Wireless Capsule Endoscopy Bleeding Tissue Detection and Classification [0.562479170374811]
We design an end-to-end trainable model for the automatic detection and classification of bleeding and non-bleeding frames.
Based on the DETR model, our model uses the Resnet50 for feature extraction, the transformer encoder-decoder for bleeding and non-bleeding region detection, and a feedforward neural network for classification.
Trained in an end-to-end approach on the Auto-WCEBleedGen Version 1 challenge training set, our model performs both detection and classification tasks as a single unit.
arXiv Detail & Related papers (2024-12-26T13:49:39Z) - Automated Bleeding Detection and Classification in Wireless Capsule Endoscopy with YOLOv8-X [2.6374023322018916]
This paper presents our solution to the Auto-WCEBleedGen Version V1 Challenge.
We developed a unified YOLOv8-X model for both detection and classification of bleeding regions.
Our approach achieved 96.10% classification accuracy and 76.8% mean Average Precision (mAP) at 0.5 IoU on the val idation dataset.
arXiv Detail & Related papers (2024-12-21T13:37:11Z) - Capsule Endoscopy Multi-classification via Gated Attention and Wavelet Transformations [1.5146068448101746]
Abnormalities in the gastrointestinal tract significantly influence the patient's health and require a timely diagnosis.
The work presents the process of developing and evaluating a novel model designed to classify gastrointestinal anomalies from a video frame.
integration of Omni Dimensional Gated Attention (OGA) mechanism and Wavelet transformation techniques into the model's architecture allowed the model to focus on the most critical areas.
The model's performance is benchmarked against two base models, VGG16 and ResNet50, demonstrating its enhanced ability to identify and classify a range of gastrointestinal abnormalities accurately.
arXiv Detail & Related papers (2024-10-25T08:01:35Z) - Breast Ultrasound Tumor Classification Using a Hybrid Multitask
CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification.
Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations.
In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z) - Comparison of retinal regions-of-interest imaged by OCT for the
classification of intermediate AMD [3.0171643773711208]
A total of 15744 B-scans from 269 intermediate AMD patients and 115 normal subjects were used in this study.
For each subset, a convolutional neural network (based on VGG16 architecture and pre-trained on ImageNet) was trained and tested.
The performance of the models was evaluated using the area under the receiver operating characteristic (AUROC), accuracy, sensitivity, and specificity.
arXiv Detail & Related papers (2023-05-04T13:48:55Z) - Attention-based Saliency Maps Improve Interpretability of Pneumothorax
Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency.
ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData.
ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z) - Enhanced Sharp-GAN For Histopathology Image Synthesis [63.845552349914186]
Histopathology image synthesis aims to address the data shortage issue in training deep learning approaches for accurate cancer detection.
We propose a novel approach that enhances the quality of synthetic images by using nuclei topology and contour regularization.
The proposed approach outperforms Sharp-GAN in all four image quality metrics on two datasets.
arXiv Detail & Related papers (2023-01-24T17:54:01Z) - Hybrid guiding: A multi-resolution refinement approach for semantic
segmentation of gigapixel histopathological images [0.7490318169877296]
We propose a cascaded convolutional neural network design, called H2G-Net, for semantic segmentation.
Design involves a detection stage using a patch-wise method, and a refinement stage using a convolutional autoencoder.
Best design achieved a Dice score of 0.933 on an independent test set of 90 WSIs.
arXiv Detail & Related papers (2021-12-07T02:31:29Z) - The Report on China-Spain Joint Clinical Testing for Rapid COVID-19 Risk
Screening by Eye-region Manifestations [59.48245489413308]
We developed and tested a COVID-19 rapid prescreening model using the eye-region images captured in China and Spain with cellphone cameras.
The performance was measured using area under receiver-operating-characteristic curve (AUC), sensitivity, specificity, accuracy, and F1.
arXiv Detail & Related papers (2021-09-18T02:28:01Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.