A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT
- URL: http://arxiv.org/abs/2410.19944v1
- Date: Fri, 25 Oct 2024 19:42:57 GMT
- Title: A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT
- Authors: Nagarajan Ganapathy, Podakanti Satyajith Chary, Teja Venkata Ramana Kumar Pithani, Pavan Kavati, Arun Kumar S,
- Abstract summary: This Paper presents an advanced approach for fine-tuning BiomedCLIP PubMedBERT, a multimodal model, to classify abnormalities in Video Capsule Endoscopy frames.
Our method categorizes images into ten specific classes: angioectasia, bleeding, erosion, erythema, foreign body, lymphangiectasia, polyp, ulcer, worms, and normal.
Performance metrics, including classification, accuracy, recall, and F1 score, indicate the models strong ability to accurately identify abnormalities in endoscopic frames.
- Score: 0.62914438169038
- License:
- Abstract: This Paper presents an advanced approach for fine-tuning BiomedCLIP PubMedBERT, a multimodal model, to classify abnormalities in Video Capsule Endoscopy (VCE) frames, aiming to enhance diagnostic efficiency in gastrointestinal healthcare. By integrating the PubMedBERT language model with a Vision Transformer (ViT) to process endoscopic images, our method categorizes images into ten specific classes: angioectasia, bleeding, erosion, erythema, foreign body, lymphangiectasia, polyp, ulcer, worms, and normal. Our workflow incorporates image preprocessing and fine-tunes the BiomedCLIP model to generate high-quality embeddings for both visual and textual inputs, aligning them through similarity scoring for classification. Performance metrics, including classification, accuracy, recall, and F1 score, indicate the models strong ability to accurately identify abnormalities in endoscopic frames, showing promise for practical use in clinical diagnostics.
Related papers
- Enhancing Multimodal Medical Image Classification using Cross-Graph Modal Contrastive Learning [5.660131312162423]
This paper proposes a novel Cross-Graph Modal Contrastive Learning framework for multimodal medical image classification.
The proposed approach is evaluated on two datasets: a Parkinson's disease (PD) dataset and a public melanoma dataset.
Results demonstrate that CGMCL outperforms conventional unimodal methods in accuracy, interpretability, and early disease prediction.
arXiv Detail & Related papers (2024-10-23T01:25:25Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - A Novel Vision Transformer with Residual in Self-attention for
Biomedical Image Classification [8.92307560991779]
This article presents the novel framework of multi-head self-attention for vision transformer (ViT)
The proposed method uses the concept of residual connection for accumulating the best attention output in each block of multi-head attention.
The results show the significant improvement over traditional ViT and other convolution based state-of-the-art classification models.
arXiv Detail & Related papers (2023-06-02T15:06:14Z) - DiffMIC: Dual-Guidance Diffusion Network for Medical Image
Classification [32.67098520984195]
We propose the first diffusion-based model (named DiffMIC) to address general medical image classification.
Our experimental results demonstrate that DiffMIC outperforms state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2023-03-19T09:15:45Z) - Endoscopy Classification Model Using Swin Transformer and Saliency Map [11.031841470875571]
We propose a new multi-label classification method, which considers two aspects of learning approaches (local and global views) for endoscopic image classification.
The results demonstrate that this method performed well for endoscopic medical images by utilizing local and global features of the images.
arXiv Detail & Related papers (2023-03-12T19:36:31Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Malignancy Prediction and Lesion Identification from Clinical
Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images.
We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.