Graph Convolution Neural Network For Weakly Supervised Abnormality
Localization In Long Capsule Endoscopy Videos
- URL: http://arxiv.org/abs/2110.09110v1
- Date: Mon, 18 Oct 2021 09:00:24 GMT
- Title: Graph Convolution Neural Network For Weakly Supervised Abnormality
Localization In Long Capsule Endoscopy Videos
- Authors: Sodiq Adewole, Philip Fernandes, James Jablonski, Andrew Copland,
Michael Porter, Sana Syed, Donald Brown
- Abstract summary: We propose an end-to-end temporal abnormality localization for long WCE videos using only weak video level labels.
Our method achieved an accuracy of 89.9% on the graph classification task and a specificity of 97.5% on the abnormal frames localization task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal activity localization in long videos is an important problem. The
cost of obtaining frame level label for long Wireless Capsule Endoscopy (WCE)
videos is prohibitive. In this paper, we propose an end-to-end temporal
abnormality localization for long WCE videos using only weak video level
labels. Physicians use Capsule Endoscopy (CE) as a non-surgical and
non-invasive method to examine the entire digestive tract in order to diagnose
diseases or abnormalities. While CE has revolutionized traditional endoscopy
procedures, a single CE examination could last up to 8 hours generating as much
as 100,000 frames. Physicians must review the entire video, frame-by-frame, in
order to identify the frames capturing relevant abnormality. This, sometimes
could be as few as just a single frame. Given this very high level of
redundancy, analyzing long CE videos can be very tedious, time consuming and
also error prone. This paper presents a novel multi-step method for an
end-to-end localization of target frames capturing abnormalities of interest in
the long video using only weak video labels. First we developed an automatic
temporal segmentation using change point detection technique to temporally
segment the video into uniform, homogeneous and identifiable segments. Then we
employed Graph Convolutional Neural Network (GCNN) to learn a representation of
each video segment. Using weak video segment labels, we trained our GCNN model
to recognize each video segment as abnormal if it contains at least a single
abnormal frame. Finally, leveraging the parameters of the trained GCNN model,
we replaced the final layer of the network with a temporal pool layer to
localize the relevant abnormal frames within each abnormal video segment. Our
method achieved an accuracy of 89.9\% on the graph classification task and a
specificity of 97.5\% on the abnormal frames localization task.
Related papers
- AtGCN: A Graph Convolutional Network For Ataxic Gait Detection [0.0]
This paper presents a graph convolution network called AtGCN for detecting ataxic gait.
The problem is challenging as the deviation of an ataxic gait from a healthy gait is very subtle.
The proposed AtGCN model outperforms the state-of-the-art in detection and prediction with an accuracy of 93.46% and a MAE of 0.4169, respectively.
arXiv Detail & Related papers (2024-10-30T09:55:30Z) - Vivim: a Video Vision Mamba for Medical Video Segmentation [52.11785024350253]
This paper presents a Video Vision Mamba-based framework, dubbed as Vivim, for medical video segmentation tasks.
Our Vivim can effectively compress the long-term representation into sequences at varying scales.
Experiments on thyroid segmentation, breast lesion segmentation in ultrasound videos, and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim.
arXiv Detail & Related papers (2024-01-25T13:27:03Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - Self-Supervised Masked Convolutional Transformer Block for Anomaly
Detection [122.4894940892536]
We present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level.
In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
arXiv Detail & Related papers (2022-09-25T04:56:10Z) - A Hierarchical Spatio-Temporal Graph Convolutional Neural Network for
Anomaly Detection in Videos [11.423072255384469]
We propose a Hierarchical Spatio-Temporal Graph Convolutional Neural Network (HSTGCNN) to address these problems.
HSTGCNN is composed of multiple branches that correspond to different levels of graph representations.
High-level graph representations are assigned higher weights to encode moving speed and directions of people in low-resolution videos while low-level graph representations are assigned higher weights to encode human skeletons in high-resolution videos.
arXiv Detail & Related papers (2021-12-08T14:03:33Z) - Object Propagation via Inter-Frame Attentions for Temporally Stable
Video Instance Segmentation [51.68840525174265]
Video instance segmentation aims to detect, segment, and track objects in a video.
Current approaches extend image-level segmentation algorithms to the temporal domain.
We propose a video instance segmentation method that alleviates the problem due to missing detections.
arXiv Detail & Related papers (2021-11-15T04:15:57Z) - Unsupervised Shot Boundary Detection for Temporal Segmentation of Long
Capsule Endoscopy Videos [0.0]
Physicians use Capsule Endoscopy (CE) as a non-invasive and non-surgical procedure to examine the entire gastrointestinal (GI) tract.
A single CE examination could last between 8 to 11 hours generating up to 80,000 frames which is compiled as a video.
arXiv Detail & Related papers (2021-10-18T07:22:46Z) - Anomaly Detection in Video Sequences: A Benchmark and Computational
Model [25.25968958782081]
We contribute a new Large-scale Anomaly Detection (LAD) database as the benchmark for anomaly detection in video sequences.
It contains 2000 video sequences including normal and abnormal video clips with 14 anomaly categories including crash, fire, violence, etc.
It provides the annotation data, including video-level labels (abnormal/normal video, anomaly type) and frame-level labels (abnormal/normal video frame) to facilitate anomaly detection.
We propose a multi-task deep neural network to solve anomaly detection as a fully-supervised learning problem.
arXiv Detail & Related papers (2021-06-16T06:34:38Z) - Reconstructive Sequence-Graph Network for Video Summarization [107.0328985865372]
Exploiting the inner-shot and inter-shot dependencies is essential for key-shot based video summarization.
We propose a Reconstructive Sequence-Graph Network (RSGN) to encode the frames and shots as sequence and graph hierarchically.
A reconstructor is developed to reward the summary generator, so that the generator can be optimized in an unsupervised manner.
arXiv Detail & Related papers (2021-05-10T01:47:55Z) - Learning Multi-Granular Hypergraphs for Video-Based Person
Re-Identification [110.52328716130022]
Video-based person re-identification (re-ID) is an important research topic in computer vision.
We propose a novel graph-based framework, namely Multi-Granular Hypergraph (MGH) to better representational capabilities.
90.0% top-1 accuracy on MARS is achieved using MGH, outperforming the state-of-the-arts schemes.
arXiv Detail & Related papers (2021-04-30T11:20:02Z) - A Self-Reasoning Framework for Anomaly Detection Using Video-Level
Labels [17.615297975503648]
Alous event detection in surveillance videos is a challenging and practical research problem among image and video processing community.
We propose a weakly supervised anomaly detection framework based on deep neural networks which is trained in a self-reasoning fashion using only video-level labels.
The proposed framework has been evaluated on publicly available real-world anomaly detection datasets including UCF-crime, ShanghaiTech and Ped2.
arXiv Detail & Related papers (2020-08-27T02:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.