Unsupervised Shot Boundary Detection for Temporal Segmentation of Long
Capsule Endoscopy Videos
- URL: http://arxiv.org/abs/2110.09067v1
- Date: Mon, 18 Oct 2021 07:22:46 GMT
- Title: Unsupervised Shot Boundary Detection for Temporal Segmentation of Long
Capsule Endoscopy Videos
- Authors: Sodiq Adewole, Philip Fernandes, James Jablonski, Andrew Copland,
Michael Porter, Sana Syed, Donald Brown
- Abstract summary: Physicians use Capsule Endoscopy (CE) as a non-invasive and non-surgical procedure to examine the entire gastrointestinal (GI) tract.
A single CE examination could last between 8 to 11 hours generating up to 80,000 frames which is compiled as a video.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Physicians use Capsule Endoscopy (CE) as a non-invasive and non-surgical
procedure to examine the entire gastrointestinal (GI) tract for diseases and
abnormalities. A single CE examination could last between 8 to 11 hours
generating up to 80,000 frames which is compiled as a video. Physicians have to
review and analyze the entire video to identify abnormalities or diseases
before making diagnosis. This review task can be very tedious, time consuming
and prone to error. While only as little as a single frame may capture useful
content that is relevant to the physicians' final diagnosis, frames covering
the small bowel region alone could be as much as 50,000. To minimize
physicians' review time and effort, this paper proposes a novel unsupervised
and computationally efficient temporal segmentation method to automatically
partition long CE videos into a homogeneous and identifiable video segments.
However, the search for temporal boundaries in a long video using high
dimensional frame-feature matrix is computationally prohibitive and
impracticable for real clinical application. Therefore, leveraging both spatial
and temporal information in the video, we first extracted high level frame
features using a pretrained CNN model and then projected the high-dimensional
frame-feature matrix to lower 1-dimensional embedding. Using this 1-dimensional
sequence embedding, we applied the Pruned Exact Linear Time (PELT) algorithm to
searched for temporal boundaries that indicates the transition points from
normal to abnormal frames and vice-versa. We experimented with multiple real
patients' CE videos and our model achieved an AUC of 66\% on multiple test
videos against expert provided labels.
Related papers
- Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios.
Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames.
We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z) - Is Two-shot All You Need? A Label-efficient Approach for Video
Segmentation in Breast Ultrasound [4.113689581316844]
We propose a novel two-shot training paradigm for BUS video segmentation.
It not only is able to capture free-range space-time consistency but also utilizes a source-dependent augmentation scheme.
Results showed that it gained comparable performance to the fully annotated ones given only 1.9% training labels.
arXiv Detail & Related papers (2024-02-07T14:47:08Z) - Vivim: a Video Vision Mamba for Medical Video Segmentation [52.11785024350253]
This paper presents a Video Vision Mamba-based framework, dubbed as Vivim, for medical video segmentation tasks.
Our Vivim can effectively compress the long-term representation into sequences at varying scales.
Experiments on thyroid segmentation, breast lesion segmentation in ultrasound videos, and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim.
arXiv Detail & Related papers (2024-01-25T13:27:03Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - A spatio-temporal network for video semantic segmentation in surgical
videos [11.548181453080087]
We propose a novel architecture for modelling temporal relationships in videos.
The proposed model includes a decoder to enable semantic video segmentation.
The proposed decoder can be used on top of any segmentation encoder to improve temporal consistency.
arXiv Detail & Related papers (2023-06-19T16:36:48Z) - YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast
Video Polyp Detection [80.68520401539979]
textbfYONA (textbfYou textbfOnly textbfNeed one textbfAdjacent Reference-frame) is an efficient end-to-end training framework for video polyp detection.
Our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.
arXiv Detail & Related papers (2023-06-06T13:53:15Z) - FetReg2021: A Challenge on Placental Vessel Segmentation and
Registration in Fetoscopy [52.3219875147181]
Fetoscopic laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS)
The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination.
Computer-assisted intervention (CAI) can provide surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking.
Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fet
arXiv Detail & Related papers (2022-06-24T23:44:42Z) - Graph Convolution Neural Network For Weakly Supervised Abnormality
Localization In Long Capsule Endoscopy Videos [0.0]
We propose an end-to-end temporal abnormality localization for long WCE videos using only weak video level labels.
Our method achieved an accuracy of 89.9% on the graph classification task and a specificity of 97.5% on the abnormal frames localization task.
arXiv Detail & Related papers (2021-10-18T09:00:24Z) - Ultrasound Video Transformers for Cardiac Ejection Fraction Estimation [3.188100483042461]
We propose a novel approach to ultrasound video analysis using a Residual Auto-Encoder Network and a BERT model adapted for token classification.
We apply our model to the task of End-Systolic (ES) and End-Diastolic (ED) frame detection and the automated computation of the left ventricular ejection fraction.
Our end-to-end learnable approach can estimate the ejection fraction with a MAE of 5.95 and $R2$ of 0.52 in 0.15s per video, showing that segmentation is not the only way to predict ejection fraction.
arXiv Detail & Related papers (2021-07-02T11:23:09Z) - Colonoscopy Polyp Detection: Domain Adaptation From Medical Report
Images to Real-time Videos [76.37907640271806]
We propose an Image-video-joint polyp detection network (Ivy-Net) to address the domain gap between colonoscopy images from historical medical reports and real-time videos.
Experiments on the collected dataset demonstrate that our Ivy-Net achieves the state-of-the-art result on colonoscopy video.
arXiv Detail & Related papers (2020-12-31T10:33:09Z) - PS-DeVCEM: Pathology-sensitive deep learning model for video capsule
endoscopy based on weakly labeled data [0.0]
We propose a pathology-sensitive deep learning model (PS-DeVCEM) for frame-level anomaly detection and multi-label classification of different colon diseases in video capsule endoscopy (VCE) data.
Our model is driven by attention-based deep multiple instance learning and is trained end-to-end on weakly labeled data.
We show our model's ability to temporally localize frames with pathologies, without frame annotation information during training.
arXiv Detail & Related papers (2020-11-22T15:33:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.