Related papers: Open Set Recognition for Endoscopic Image Classification: A Deep Learning Approach on the Kvasir Dataset

Open Set Recognition for Endoscopic Image Classification: A Deep Learning Approach on the Kvasir Dataset

URL: http://arxiv.org/abs/2506.18284v1
Date: Mon, 23 Jun 2025 04:39:07 GMT
Title: Open Set Recognition for Endoscopic Image Classification: A Deep Learning Approach on the Kvasir Dataset
Authors: Kasra Moazzami, Seoyoun Son, John Lin, Sun Min Lee, Daniel Son, Hayeon Lee, Jeongho Lee, Seongji Lee,
Abstract summary: We evaluate and compare the OSR capabilities of several representative deep learning architectures, including ResNet-50, Swin Transformer, and a hybrid ResNet-Transformer model, under both closed-set and open-set conditions.<n>This work represents one of the first efforts to apply open set recognition to the Kvasir dataset and provides a benchmark for evaluating OSR performance in medical image analysis.
Score: 5.762226441746656
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Endoscopic image classification plays a pivotal role in medical diagnostics by identifying anatomical landmarks and pathological findings. However, conventional closed-set classification frameworks are inherently limited in open-world clinical settings, where previously unseen conditions can arise andcompromise model reliability. To address this, we explore the application of Open Set Recognition (OSR) techniques on the Kvasir dataset, a publicly available and diverse endoscopic image collection. In this study, we evaluate and compare the OSR capabilities of several representative deep learning architectures, including ResNet-50, Swin Transformer, and a hybrid ResNet-Transformer model, under both closed-set and open-set conditions. OpenMax is adopted as a baseline OSR method to assess the ability of these models to distinguish known classes from previously unseen categories. This work represents one of the first efforts to apply open set recognition to the Kvasir dataset and provides a foundational benchmark for evaluating OSR performance in medical image analysis. Our results offer practical insights into model behavior in clinically realistic settings and highlight the importance of OSR techniques for the safe deployment of AI systems in endoscopy.

Related papers

Electromagnetic Scattering Kernel Guided Reciprocal Point Learning for SAR Open-Set Recognition [6.226365654670747]
Open Set Recognition (OSR) aims to categorize known classes while denoting unknown ones as "unknown"<n>To enhance open-set SAR classification, a method called scattering kernel with reciprocal learning network is proposed.<n>Proposal is made to design convolutional kernels based on large-sized attribute scattering center models.
arXiv Detail & Related papers (2024-11-07T13:26:20Z)
Visual Prompt Engineering for Vision Language Models in Radiology [0.17183214167143138]
Contrastive Language-Image Pretraining (CLIP) offers a promising solution by enabling zero-shot classification through multimodal large-scale pretraining.<n>While CLIP effectively captures global image content, radiology requires a more localized focus on specific pathology regions to enhance both interpretability and diagnostic accuracy.<n>We explore the potential of incorporating visual cues into zero-shot classification, embedding visual markers, such as arrows, bounding boxes, and circles, directly into radiological images to guide model attention.
arXiv Detail & Related papers (2024-08-28T13:53:27Z)
Polar-Net: A Clinical-Friendly Model for Alzheimer's Disease Detection in OCTA Images [53.235117594102675]
Optical Coherence Tomography Angiography is a promising tool for detecting Alzheimer's disease (AD) by imaging the retinal microvasculature. We propose a novel deep-learning framework called Polar-Net to provide interpretable results and leverage clinical prior knowledge. We show that Polar-Net outperforms existing state-of-the-art methods and provides more valuable pathological evidence for the association between retinal vascular changes and AD.
arXiv Detail & Related papers (2023-11-10T11:49:49Z)
RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy [83.4885991036141]
RIDE is a learning-based method for rotation-equivariant detection and invariant description. It is trained in a self-supervised manner on a large curation of endoscopic images. It sets a new state-of-the-art performance on matching and relative pose estimation tasks.
arXiv Detail & Related papers (2023-09-18T08:16:30Z)
K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality Assessment [71.27193056354741]
The problem of how to assess cross-modality medical image synthesis has been largely unexplored. We propose a new metric K-CROSS to spur progress on this challenging problem. K-CROSS uses a pre-trained multi-modality segmentation network to predict the lesion location.
arXiv Detail & Related papers (2023-07-10T01:26:48Z)
k-SALSA: k-anonymous synthetic averaging of retinal images via local style alignment [6.36950432352094]
We introduce k-SALSA, a generative adversarial network (GAN)-based framework for synthesizing retinal fundus images. k-SALSA brings together state-of-the-art techniques for training and inverting GANs to achieve practical performance on retinal images. Our work represents a step toward broader sharing of retinal images for scientific collaboration.
arXiv Detail & Related papers (2023-03-20T01:47:04Z)
Learning disentangled representations for explainable chest X-ray classification using Dirichlet VAEs [68.73427163074015]
This study explores the use of the Dirichlet Variational Autoencoder (DirVAE) for learning disentangled latent representations of chest X-ray (CXR) images. The predictive capacity of multi-modal latent representations learned by DirVAE models is investigated through implementation of an auxiliary multi-label classification task.
arXiv Detail & Related papers (2023-02-06T18:10:08Z)
Evaluation of Various Open-Set Medical Imaging Tasks with Deep Neural Networks [15.655519786176438]
We conduct rigorous evaluations amongst state-of-the-art open-set methods, exploring different open-set scenarios. We show the main difference between general domain-trained and medical domain-trained open-set models.
arXiv Detail & Related papers (2021-10-21T04:19:41Z)
Malignancy Prediction and Lesion Identification from Clinical Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images. We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z)
Intrapapillary Capillary Loop Classification in Magnification Endoscopy: Open Dataset and Baseline Methodology [8.334256673330879]
We build a computer-assisted detection system that can classify still images or video frames as normal or abnormal. We present a new benchmark dataset containing 68K binary labeled frames extracted from 114 patient videos. The proposed method achieved an average accuracy of 91.7 % compared to the 94.7 % achieved by a group of 12 senior clinicians.
arXiv Detail & Related papers (2021-02-19T14:55:21Z)
Explaining Clinical Decision Support Systems in Medical Imaging using Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest. clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend. We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.