Contrastive Transformer-based Multiple Instance Learning for Weakly
Supervised Polyp Frame Detection
- URL: http://arxiv.org/abs/2203.12121v1
- Date: Wed, 23 Mar 2022 01:30:48 GMT
- Title: Contrastive Transformer-based Multiple Instance Learning for Weakly
Supervised Polyp Frame Detection
- Authors: Yu Tian and Guansong Pang and Fengbei Liu and Yuyuan Liu and Chong
Wang and Yuanhong Chen and Johan W Verjans and Gustavo Carneiro
- Abstract summary: Current polyp detection methods from colonoscopy videos use exclusively normal (i.e., healthy) training images.
We formulate polyp detection as a weakly-supervised anomaly detection task that uses video-level labelled training data to detect frame-level polyps.
- Score: 30.51410140271929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current polyp detection methods from colonoscopy videos use exclusively
normal (i.e., healthy) training images, which i) ignore the importance of
temporal information in consecutive video frames, and ii) lack knowledge about
the polyps. Consequently, they often have high detection errors, especially on
challenging polyp cases (e.g., small, flat, or partially visible polyps). In
this work, we formulate polyp detection as a weakly-supervised anomaly
detection task that uses video-level labelled training data to detect
frame-level polyps. In particular, we propose a novel convolutional
transformer-based multiple instance learning method designed to identify
abnormal frames (i.e., frames with polyps) from anomalous videos (i.e., videos
containing at least one frame with polyp). In our method, local and global
temporal dependencies are seamlessly captured while we simultaneously optimise
video and snippet-level anomaly scores. A contrastive snippet mining method is
also proposed to enable an effective modelling of the challenging polyp cases.
The resulting method achieves a detection accuracy that is substantially better
than current state-of-the-art approaches on a new large-scale colonoscopy video
dataset introduced in this work.
Related papers
- SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation [4.027361638728112]
We propose a video polyp segmentation method that performs self-supervised learning as an auxiliary task and a spatial-temporal self-attention mechanism for improved representation learning.
Our experimental results demonstrate an improvement with respect to several state-of-the-art (SOTA) methods.
Our ablation study confirms that the choice of the proposed joint end-to-end training improves network accuracy by over 3% and nearly 10% on both the Dice similarity coefficient and intersection-over-union.
arXiv Detail & Related papers (2024-06-14T17:33:11Z) - ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic
Polyp Detection [88.4359020192429]
Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases.
In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training & end-to-end inference framework.
Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps.
In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting
arXiv Detail & Related papers (2024-01-10T07:03:41Z) - Self-Supervised Polyp Re-Identification in Colonoscopy [1.9678816712224196]
We propose a robust long term polyp tracking method based on re-identification by visual appearance.
Our solution uses an attention-based self-supervised ML model, specifically designed to leverage the temporal nature of video input.
arXiv Detail & Related papers (2023-06-14T15:53:54Z) - YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast
Video Polyp Detection [80.68520401539979]
textbfYONA (textbfYou textbfOnly textbfNeed one textbfAdjacent Reference-frame) is an efficient end-to-end training framework for video polyp detection.
Our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.
arXiv Detail & Related papers (2023-06-06T13:53:15Z) - Accurate Real-time Polyp Detection in Videos from Concatenation of
Latent Features Extracted from Consecutive Frames [5.2009074009536524]
Convolutional neural networks (CNNs) are vulnerable to small changes in the input image.
A CNN-based model may miss the same polyp appearing in a series of consecutive frames.
We propose an efficient feature concatenation method for a CNN-based encoder-decoder model.
arXiv Detail & Related papers (2023-03-10T11:51:22Z) - Colonoscopy polyp detection with massive endoscopic images [4.458670612147842]
We improved an existing end-to-end polyp detection model with better average precision validated by different data sets.
Our model can achieve state-of-the-art polyp detection performance while still maintain real-time detection speed.
arXiv Detail & Related papers (2022-02-17T16:07:59Z) - Self-Supervised Predictive Convolutional Attentive Block for Anomaly
Detection [97.93062818228015]
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block.
Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field.
We demonstrate the generality of our block by integrating it into several state-of-the-art frameworks for anomaly detection on image and video.
arXiv Detail & Related papers (2021-11-17T13:30:31Z) - Colonoscopy Polyp Detection: Domain Adaptation From Medical Report
Images to Real-time Videos [76.37907640271806]
We propose an Image-video-joint polyp detection network (Ivy-Net) to address the domain gap between colonoscopy images from historical medical reports and real-time videos.
Experiments on the collected dataset demonstrate that our Ivy-Net achieves the state-of-the-art result on colonoscopy video.
arXiv Detail & Related papers (2020-12-31T10:33:09Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.