Related papers: Colonoscopy Landmark Detection using Vision Transformers

Colonoscopy Landmark Detection using Vision Transformers

URL: http://arxiv.org/abs/2209.11304v1
Date: Thu, 22 Sep 2022 20:39:07 GMT
Title: Colonoscopy Landmark Detection using Vision Transformers
Authors: Aniruddha Tamhane and Tse'ela Mida and Erez Posner and Moshe Bouhnik
Abstract summary: We have collected a dataset of 120 videos and 2416 snapshots taken during the procedure. We have developed a novel, vision-transformer based landmark detection algorithm. We report an accuracy of 82% with the vision transformer backbone on a test dataset of snapshots.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Colonoscopy is a routine outpatient procedure used to examine the colon and rectum for any abnormalities including polyps, diverticula and narrowing of colon structures. A significant amount of the clinician's time is spent in post-processing snapshots taken during the colonoscopy procedure, for maintaining medical records or further investigation. Automating this step can save time and improve the efficiency of the process. In our work, we have collected a dataset of 120 colonoscopy videos and 2416 snapshots taken during the procedure, that have been annotated by experts. Further, we have developed a novel, vision-transformer based landmark detection algorithm that identifies key anatomical landmarks (the appendiceal orifice, ileocecal valve/cecum landmark and rectum retroflexion) from snapshots taken during colonoscopy. Our algorithm uses an adaptive gamma correction during preprocessing to maintain a consistent brightness for all images. We then use a vision transformer as the feature extraction backbone and a fully connected network based classifier head to categorize a given frame into four classes: the three landmarks or a non-landmark frame. We compare the vision transformer (ViT-B/16) backbone with ResNet-101 and ConvNext-B backbones that have been trained similarly. We report an accuracy of 82% with the vision transformer backbone on a test dataset of snapshots.

Related papers

Enhanced Feature-based Image Stitching for Endoscopic Videos in Pediatric Eosinophilic Esophagitis [7.634233891270609]
We propose a novel preprocessing pipeline designed to enhance endoscopic image stitching. Our approach converts endoscopic video data into continuous 2D images by following four key steps. Experiments conducted on 20 pediatric endoscopy videos demonstrate that our method significantly improves image alignment and stitching quality.
arXiv Detail & Related papers (2025-02-06T16:47:28Z)
Frontiers in Intelligent Colonoscopy [96.57251132744446]
This study investigates the frontiers of intelligent colonoscopy techniques and their prospective implications for multimodal medical applications. We assess the current data-centric and model-centric landscapes through four tasks for colonoscopic scene perception. To embrace the coming multimodal era, we establish three foundational initiatives: a large-scale multimodal instruction tuning dataset ColonINST, a colonoscopy-designed multimodal language model ColonGPT, and a multimodal benchmark.
arXiv Detail & Related papers (2024-10-22T17:57:12Z)
Intraoperative Registration by Cross-Modal Inverse Neural Rendering [61.687068931599846]
We present a novel approach for 3D/2D intraoperative registration during neurosurgery via cross-modal inverse neural rendering. Our approach separates implicit neural representation into two components, handling anatomical structure preoperatively and appearance intraoperatively. We tested our method on retrospective patients' data from clinical cases, showing that our method outperforms state-of-the-art while meeting current clinical standards for registration.
arXiv Detail & Related papers (2024-09-18T13:40:59Z)
Semantic Parsing of Colonoscopy Videos with Multi-Label Temporal Networks [2.788533099191487]
We present a method for automatic semantic parsing of colonoscopy videos. The method uses a novel DL multi-label temporal segmentation model trained in supervised and unsupervised regimes. We evaluate the accuracy of the method on a test set of over 300 annotated colonoscopy videos, and use ablation to explore the relative importance of various method's components.
arXiv Detail & Related papers (2023-06-12T08:46:02Z)
ColonMapper: topological mapping and localization for colonoscopy [7.242530499990028]
We propose a topological mapping and localization system able to operate on real human colonoscopies. The map is a graph where each node codes a colon location by a set of real images, while edges represent traversability between nodes. Experiments show that ColonMapper is able to autonomously build a map and localize against it in two important use cases.
arXiv Detail & Related papers (2023-05-09T15:32:50Z)
FetReg2021: A Challenge on Placental Vessel Segmentation and Registration in Fetoscopy [52.3219875147181]
Fetoscopic laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS) The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination. Computer-assisted intervention (CAI) can provide surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking. Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fet
arXiv Detail & Related papers (2022-06-24T23:44:42Z)
Bimodal Camera Pose Prediction for Endoscopy [23.12495584329767]
We propose SimCol, a synthetic dataset for camera pose estimation in colonoscopy. Our dataset replicates real colonoscope motion and highlights the drawbacks of existing methods. We publish 18k RGB images from simulated colonoscopy with corresponding depth and camera poses and make our data generation environment in Unity publicly available.
arXiv Detail & Related papers (2022-04-11T09:34:34Z)
CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for Non-Contrast to Contrast CT Translation [56.622832383316215]
We propose a novel approach to translate unpaired contrast computed tomography (CT) scans to non-contrast CT scans. Our approach is based on cycle-consistent generative adversarial convolutional transformers, for short, CyTran. Our empirical results show that CyTran outperforms all competing methods.
arXiv Detail & Related papers (2021-10-12T23:25:03Z)
Deep Learning-based Biological Anatomical Landmark Detection in Colonoscopy Videos [21.384094148149003]
We propose a novel deep learning-based approach to detect biological anatomical landmarks in colonoscopy videos. Average detection accuracy reaches 99.75%, while the average IoU of 0.91 shows a high degree of similarity between our predicted landmark periods and ground truth.
arXiv Detail & Related papers (2021-08-06T05:52:32Z)
FoldIt: Haustral Folds Detection and Segmentation in Colonoscopy Videos [6.187780920448871]
Haustral folds are colon wall protrusions implicated for high polyp miss rate during optical colonoscopy procedures. We present a novel generative adversarial network, FoldIt, for feature-consistent image translation of optical colonoscopy videos to virtual colonoscopy renderings with haustral fold overlays.
arXiv Detail & Related papers (2021-06-23T16:41:10Z)
Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos [76.37907640271806]
We propose an Image-video-joint polyp detection network (Ivy-Net) to address the domain gap between colonoscopy images from historical medical reports and real-time videos. Experiments on the collected dataset demonstrate that our Ivy-Net achieves the state-of-the-art result on colonoscopy video.
arXiv Detail & Related papers (2020-12-31T10:33:09Z)
Assisted Probe Positioning for Ultrasound Guided Radiotherapy Using Image Sequence Classification [55.96221340756895]
Effective transperineal ultrasound image guidance in prostate external beam radiotherapy requires consistent alignment between probe and prostate at each session during patient set-up. We demonstrate a method for ensuring accurate probe placement through joint classification of images and probe position data. Using a multi-input multi-task algorithm, spatial coordinate data from an optically tracked ultrasound probe is combined with an image clas-sifier using a recurrent neural network to generate two sets of predictions in real-time. The algorithm identified optimal probe alignment within a mean (standard deviation) range of 3.7$circ$ (1.2$circ$) from
arXiv Detail & Related papers (2020-10-06T13:55:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.