Multi-Class Abnormality Classification in Video Capsule Endoscopy Using Deep Learning
- URL: http://arxiv.org/abs/2410.18879v2
- Date: Sun, 01 Dec 2024 11:58:37 GMT
- Title: Multi-Class Abnormality Classification in Video Capsule Endoscopy Using Deep Learning
- Authors: Arnav Samal, Ranya Batsyas,
- Abstract summary: This report outlines Team Seq2Cure's deep learning approach for the Capsule Vision 2024 Challenge.<n>We leverage an ensemble of convolutional neural networks (CNNs) and transformer-based architectures for multi-class abnormality classification in video capsule endoscopy frames.<n>Our approach achieved a balanced accuracy of 86.34 percent and a mean AUC-ROC score of 0.9908 on the validation set, earning our submission 5th place in the challenge.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This report outlines Team Seq2Cure's deep learning approach for the Capsule Vision 2024 Challenge, leveraging an ensemble of convolutional neural networks (CNNs) and transformer-based architectures for multi-class abnormality classification in video capsule endoscopy frames. The dataset comprised over 50,000 frames from three public sources and one private dataset, labeled across 10 abnormality classes. To overcome the limitations of traditional CNNs in capturing global context, we integrated CNN and transformer models within a multi-model ensemble. Our approach achieved a balanced accuracy of 86.34 percent and a mean AUC-ROC score of 0.9908 on the validation set, earning our submission 5th place in the challenge. Code is available at http://github.com/arnavs04/capsule-vision-2024 .
Related papers
- Perception Encoder: The best visual embeddings are not at the output of the network [70.86738083862099]
We introduce Perception (PE), a vision encoder for image and video understanding trained via simple vision-language learning.
We find that contrastive vision-language training alone can produce strong, general embeddings for all of these downstream tasks.
Together, our PE family of models achieves best-in-class results on a wide variety of tasks.
arXiv Detail & Related papers (2025-04-17T17:59:57Z) - Capsule Vision Challenge 2024: Multi-Class Abnormality Classification for Video Capsule Endoscopy [1.124958340749622]
We present an approach to developing a model for classifying abnormalities in video capsule endoscopy (VCE) frames.
We implement a tiered augmentation strategy using the albumentations library to enhance minority class representation.
Our pipeline, developed in PyTorch, employs a flexible architecture enabling seamless adjustments to classification complexity.
arXiv Detail & Related papers (2024-11-03T08:34:04Z) - Multi-Class Abnormality Classification Task in Video Capsule Endoscopy [3.656114607436271]
We address the challenge of multi-class anomaly classification in Video Capsule Endoscopy (VCE) with a variety of deep learning models.
The purpose is to correctly classify diverse gastrointestinal disorders, which is critical for increasing diagnostic efficiency in clinical settings.
arXiv Detail & Related papers (2024-10-25T21:22:52Z) - Classification of Endoscopy and Video Capsule Images using CNN-Transformer Model [1.0994755279455526]
This study proposes a hybrid model that combines the advantages of Transformers and Convolutional Neural Networks (CNNs) to enhance classification performance.
For the GastroVision dataset, our proposed model demonstrates excellent performance with Precision, Recall, F1 score, Accuracy, and Matthews Correlation Coefficient (MCC) of 0.8320, 0.8386, 0.8324, 0.8386, and 0.8191, respectively.
arXiv Detail & Related papers (2024-08-20T11:05:32Z) - Severity classification of ground-glass opacity via 2-D convolutional
neural network and lung CT scans: a 3-day exploration [0.0]
Ground-glass opacity is a hallmark of numerous lung diseases, including patients with COVID19 and pneumonia, pulmonary fibrosis, and tuberculosis.
This note presents experimental results of a proof-of-concept framework that got implemented and tested over three days as driven by the third challenge entitled "COVID-19 Competition"
As part of the challenge requirement, the source code produced during the course of this exercise is posted at https://github.com/lisatwyw/cov19.
arXiv Detail & Related papers (2023-03-23T22:35:37Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - UniFormer: Unifying Convolution and Self-attention for Visual
Recognition [69.68907941116127]
Convolution neural networks (CNNs) and vision transformers (ViTs) have been two dominant frameworks in the past few years.
We propose a novel Unified transFormer (UniFormer) which seamlessly integrates the merits of convolution and self-attention in a concise transformer format.
Our UniFormer achieves 86.3 top-1 accuracy on ImageNet-1K classification.
arXiv Detail & Related papers (2022-01-24T04:39:39Z) - Multiclass Anomaly Detection in GI Endoscopic Images using Optimized
Deep One-class Classification in an Imbalanced Dataset [0.0]
Wireless Capsule Endoscopy helps physicians examine the gastrointestinal (GI) tract noninvasively.
Many available datasets, such as KID2 and Kvasir, suffer from imbalance issue which make it difficult to train an effective artificial intelligence (AI) system.
In this study, an ensemble of one-class classifiers is used for detecting anomaly.
arXiv Detail & Related papers (2021-03-15T16:28:42Z) - PS-DeVCEM: Pathology-sensitive deep learning model for video capsule
endoscopy based on weakly labeled data [0.0]
We propose a pathology-sensitive deep learning model (PS-DeVCEM) for frame-level anomaly detection and multi-label classification of different colon diseases in video capsule endoscopy (VCE) data.
Our model is driven by attention-based deep multiple instance learning and is trained end-to-end on weakly labeled data.
We show our model's ability to temporally localize frames with pathologies, without frame annotation information during training.
arXiv Detail & Related papers (2020-11-22T15:33:37Z) - A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness.
Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set.
Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Automatic sleep stage classification with deep residual networks in a
mixed-cohort setting [63.52264764099532]
We developed a novel deep neural network model to assess the generalizability of several large-scale cohorts.
Overall classification accuracy improved with increasing fractions of training data.
arXiv Detail & Related papers (2020-08-21T10:48:35Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.