Deep Transformers for Fast Small Intestine Grounding in Capsule
Endoscope Video
- URL: http://arxiv.org/abs/2104.02866v1
- Date: Wed, 7 Apr 2021 02:35:18 GMT
- Title: Deep Transformers for Fast Small Intestine Grounding in Capsule
Endoscope Video
- Authors: Xinkai Zhao, Chaowei Fang, Feng Gao, De-Jun Fan, Xutao Lin, Guanbin Li
- Abstract summary: We propose a deep model to ground shooting range of small intestine from a capsule endoscope video.
This is the first attempt to attack the small intestine grounding task using deep neural network method.
- Score: 42.84449937667722
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Capsule endoscopy is an evolutional technique for examining and diagnosing
intractable gastrointestinal diseases. Because of the huge amount of data,
analyzing capsule endoscope videos is very time-consuming and labor-intensive
for gastrointestinal medicalists. The development of intelligent long video
analysis algorithms for regional positioning and analysis of capsule endoscopic
video is therefore essential to reduce the workload of clinicians and assist in
improving the accuracy of disease diagnosis. In this paper, we propose a deep
model to ground shooting range of small intestine from a capsule endoscope
video which has duration of tens of hours. This is the first attempt to attack
the small intestine grounding task using deep neural network method. We model
the task as a 3-way classification problem, in which every video frame is
categorized into esophagus/stomach, small intestine or colorectum. To explore
long-range temporal dependency, a transformer module is built to fuse features
of multiple neighboring frames. Based on the classification model, we devise an
efficient search algorithm to efficiently locate the starting and ending
shooting boundaries of the small intestine. Without searching the small
intestine exhaustively in the full video, our method is implemented via
iteratively separating the video segment along the direction to the target
boundary in the middle. We collect 113 videos from a local hospital to validate
our method. In the 5-fold cross validation, the average IoU between the small
intestine segments located by our method and the ground-truths annotated by
broad-certificated gastroenterologists reaches 0.945.
Related papers
- 3D Reconstruction of the Human Colon from Capsule Endoscope Video [2.3513645401551337]
We investigate the possibility of constructing 3D models of whole sections of the human colon using image sequences from wireless capsule endoscope video.
Recent developments of virtual graphics-based models of the human gastrointestinal system, where distortion and artifacts can be enabled or disabled, makes it possible to dissect'' the problem.
arXiv Detail & Related papers (2024-07-21T17:31:38Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - CodaMal: Contrastive Domain Adaptation for Malaria Detection in Low-Cost Microscopes [51.5625352379093]
Malaria is a major health issue worldwide, and its diagnosis requires scalable solutions that can work effectively with low-cost microscopes (LCM)
Deep learning-based methods have shown success in computer-aided diagnosis from microscopic images.
These methods need annotated images that show cells affected by malaria parasites and their life stages.
Annotating images from LCM significantly increases the burden on medical experts compared to annotating images from high-cost microscopes (HCM)
arXiv Detail & Related papers (2024-02-16T06:57:03Z) - AiAReSeg: Catheter Detection and Segmentation in Interventional
Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature.
This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z) - Unsupervised Shot Boundary Detection for Temporal Segmentation of Long
Capsule Endoscopy Videos [0.0]
Physicians use Capsule Endoscopy (CE) as a non-invasive and non-surgical procedure to examine the entire gastrointestinal (GI) tract.
A single CE examination could last between 8 to 11 hours generating up to 80,000 frames which is compiled as a video.
arXiv Detail & Related papers (2021-10-18T07:22:46Z) - Deep Learning-based Biological Anatomical Landmark Detection in
Colonoscopy Videos [21.384094148149003]
We propose a novel deep learning-based approach to detect biological anatomical landmarks in colonoscopy videos.
Average detection accuracy reaches 99.75%, while the average IoU of 0.91 shows a high degree of similarity between our predicted landmark periods and ground truth.
arXiv Detail & Related papers (2021-08-06T05:52:32Z) - Lesion2Vec: Deep Metric Learning for Few-Shot Multiple Lesions
Recognition in Wireless Capsule Endoscopy Video [0.0]
Wireless Capsule Endoscopy (WCE) has revolutionized traditional endoscopy procedure by allowing gastroenterologists visualize the entire GI tract non-invasively.
A single video can last up to 8 hours producing between 30,000 to 100,000 images.
We propose a metric-based learning framework followed by a few-shot lesion recognition in WCE data.
arXiv Detail & Related papers (2021-01-11T23:58:56Z) - Colonoscopy Polyp Detection: Domain Adaptation From Medical Report
Images to Real-time Videos [76.37907640271806]
We propose an Image-video-joint polyp detection network (Ivy-Net) to address the domain gap between colonoscopy images from historical medical reports and real-time videos.
Experiments on the collected dataset demonstrate that our Ivy-Net achieves the state-of-the-art result on colonoscopy video.
arXiv Detail & Related papers (2020-12-31T10:33:09Z) - Motion-based Camera Localization System in Colonoscopy Videos [7.800211144015489]
We propose a camera localization system to estimate the relative location of the camera and classify the colon into anatomical segments.
The experimental results show that the performance of the proposed method is superior to other published methods.
arXiv Detail & Related papers (2020-12-03T03:57:12Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.