Semantic Parsing of Colonoscopy Videos with Multi-Label Temporal
Networks
- URL: http://arxiv.org/abs/2306.06960v2
- Date: Tue, 22 Aug 2023 11:31:21 GMT
- Title: Semantic Parsing of Colonoscopy Videos with Multi-Label Temporal
Networks
- Authors: Ori Kelner, Or Weinstein, Ehud Rivlin, and Roman Goldenberg
- Abstract summary: We present a method for automatic semantic parsing of colonoscopy videos.
The method uses a novel DL multi-label temporal segmentation model trained in supervised and unsupervised regimes.
We evaluate the accuracy of the method on a test set of over 300 annotated colonoscopy videos, and use ablation to explore the relative importance of various method's components.
- Score: 2.788533099191487
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Following the successful debut of polyp detection and characterization, more
advanced automation tools are being developed for colonoscopy. The new
automation tasks, such as quality metrics or report generation, require
understanding of the procedure flow that includes activities, events,
anatomical landmarks, etc. In this work we present a method for automatic
semantic parsing of colonoscopy videos. The method uses a novel DL multi-label
temporal segmentation model trained in supervised and unsupervised regimes. We
evaluate the accuracy of the method on a test set of over 300 annotated
colonoscopy videos, and use ablation to explore the relative importance of
various method's components.
Related papers
- Frontiers in Intelligent Colonoscopy [96.57251132744446]
This study investigates the frontiers of intelligent colonoscopy techniques and their prospective implications for multimodal medical applications.
We assess the current data-centric and model-centric landscapes through four tasks for colonoscopic scene perception.
To embrace the coming multimodal era, we establish three foundational initiatives: a large-scale multimodal instruction tuning dataset ColonINST, a colonoscopy-designed multimodal language model ColonGPT, and a multimodal benchmark.
arXiv Detail & Related papers (2024-10-22T17:57:12Z) - SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation [4.027361638728112]
We propose a video polyp segmentation method that performs self-supervised learning as an auxiliary task and a spatial-temporal self-attention mechanism for improved representation learning.
Our experimental results demonstrate an improvement with respect to several state-of-the-art (SOTA) methods.
Our ablation study confirms that the choice of the proposed joint end-to-end training improves network accuracy by over 3% and nearly 10% on both the Dice similarity coefficient and intersection-over-union.
arXiv Detail & Related papers (2024-06-14T17:33:11Z) - Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Guidewire Segmentation in Robot-Assisted Cardiovascular Catheterization [4.894147633944561]
We propose a weakly-supervised learning method with multi-lateral pseudo labeling for tool segmentation in cardiac angiograms.
We trained the model end-to-end with weakly-annotated data obtained during robotic cardiac catheterization.
Compared to three existing weakly-supervised methods, our approach yielded higher segmentation performance across three different cardiac angiogram data.
arXiv Detail & Related papers (2024-04-11T09:23:44Z) - Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions.
We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z) - Self-Supervised Polyp Re-Identification in Colonoscopy [1.9678816712224196]
We propose a robust long term polyp tracking method based on re-identification by visual appearance.
Our solution uses an attention-based self-supervised ML model, specifically designed to leverage the temporal nature of video input.
arXiv Detail & Related papers (2023-06-14T15:53:54Z) - Colonoscopy Landmark Detection using Vision Transformers [0.0]
We have collected a dataset of 120 videos and 2416 snapshots taken during the procedure.
We have developed a novel, vision-transformer based landmark detection algorithm.
We report an accuracy of 82% with the vision transformer backbone on a test dataset of snapshots.
arXiv Detail & Related papers (2022-09-22T20:39:07Z) - FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
Assessment [93.09267863425492]
We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable.
We construct a new fine-grained dataset, called FineDiving, developed on diverse diving events with detailed annotations on action procedures.
arXiv Detail & Related papers (2022-04-07T17:59:32Z) - Learning To Recognize Procedural Activities with Distant Supervision [96.58436002052466]
We consider the problem of classifying fine-grained, multi-step activities from long videos spanning up to several minutes.
Our method uses a language model to match noisy, automatically-transcribed speech from the video to step descriptions in the knowledge base.
arXiv Detail & Related papers (2022-01-26T15:06:28Z) - Colonoscopy Polyp Detection: Domain Adaptation From Medical Report
Images to Real-time Videos [76.37907640271806]
We propose an Image-video-joint polyp detection network (Ivy-Net) to address the domain gap between colonoscopy images from historical medical reports and real-time videos.
Experiments on the collected dataset demonstrate that our Ivy-Net achieves the state-of-the-art result on colonoscopy video.
arXiv Detail & Related papers (2020-12-31T10:33:09Z) - A Benchmark for Structured Procedural Knowledge Extraction from Cooking
Videos [126.66212285239624]
We propose a benchmark of structured procedural knowledge extracted from cooking videos.
Our manually annotated open-vocabulary resource includes 356 instructional cooking videos and 15,523 video clip/sentence-level annotations.
arXiv Detail & Related papers (2020-05-02T05:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.