Related papers: Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection

Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection

URL: http://arxiv.org/abs/2203.12121v1
Date: Wed, 23 Mar 2022 01:30:48 GMT
Title: Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection
Authors: Yu Tian and Guansong Pang and Fengbei Liu and Yuyuan Liu and Chong Wang and Yuanhong Chen and Johan W Verjans and Gustavo Carneiro
Abstract summary: Current polyp detection methods from colonoscopy videos use exclusively normal (i.e., healthy) training images. We formulate polyp detection as a weakly-supervised anomaly detection task that uses video-level labelled training data to detect frame-level polyps.
Score: 30.51410140271929
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current polyp detection methods from colonoscopy videos use exclusively normal (i.e., healthy) training images, which i) ignore the importance of temporal information in consecutive video frames, and ii) lack knowledge about the polyps. Consequently, they often have high detection errors, especially on challenging polyp cases (e.g., small, flat, or partially visible polyps). In this work, we formulate polyp detection as a weakly-supervised anomaly detection task that uses video-level labelled training data to detect frame-level polyps. In particular, we propose a novel convolutional transformer-based multiple instance learning method designed to identify abnormal frames (i.e., frames with polyps) from anomalous videos (i.e., videos containing at least one frame with polyp). In our method, local and global temporal dependencies are seamlessly captured while we simultaneously optimise video and snippet-level anomaly scores. A contrastive snippet mining method is also proposed to enable an effective modelling of the challenging polyp cases. The resulting method achieves a detection accuracy that is substantially better than current state-of-the-art approaches on a new large-scale colonoscopy video dataset introduced in this work.

Related papers

PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis [28.764513004699676]
PolypSegTrack is a novel foundation model that jointly addresses polyp detection, segmentation, classification and unsupervised tracking in colonoscopic videos. Our approach leverages a novel conditional mask loss, enabling flexible training across datasets with either pixel-level segmentation masks or bounding box annotations. Our unsupervised tracking module reliably associates polyp instances across frames using object queries, without relying on any visions.
arXiv Detail & Related papers (2025-03-31T14:00:21Z)
SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation [4.027361638728112]
We propose a video polyp segmentation method that performs self-supervised learning as an auxiliary task and a spatial-temporal self-attention mechanism for improved representation learning. Our experimental results demonstrate an improvement with respect to several state-of-the-art (SOTA) methods. Our ablation study confirms that the choice of the proposed joint end-to-end training improves network accuracy by over 3% and nearly 10% on both the Dice similarity coefficient and intersection-over-union.
arXiv Detail & Related papers (2024-06-14T17:33:11Z)
ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection [88.4359020192429]
Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training & end-to-end inference framework. Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting
arXiv Detail & Related papers (2024-01-10T07:03:41Z)
Self-Supervised Polyp Re-Identification in Colonoscopy [1.9678816712224196]
We propose a robust long term polyp tracking method based on re-identification by visual appearance. Our solution uses an attention-based self-supervised ML model, specifically designed to leverage the temporal nature of video input.
arXiv Detail & Related papers (2023-06-14T15:53:54Z)
YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection [80.68520401539979]
textbfYONA (textbfYou textbfOnly textbfNeed one textbfAdjacent Reference-frame) is an efficient end-to-end training framework for video polyp detection. Our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.
arXiv Detail & Related papers (2023-06-06T13:53:15Z)
Accurate Real-time Polyp Detection in Videos from Concatenation of Latent Features Extracted from Consecutive Frames [5.2009074009536524]
Convolutional neural networks (CNNs) are vulnerable to small changes in the input image. A CNN-based model may miss the same polyp appearing in a series of consecutive frames. We propose an efficient feature concatenation method for a CNN-based encoder-decoder model.
arXiv Detail & Related papers (2023-03-10T11:51:22Z)
Colonoscopy polyp detection with massive endoscopic images [4.458670612147842]
We improved an existing end-to-end polyp detection model with better average precision validated by different data sets. Our model can achieve state-of-the-art polyp detection performance while still maintain real-time detection speed.
arXiv Detail & Related papers (2022-02-17T16:07:59Z)
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection [97.93062818228015]
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field. We demonstrate the generality of our block by integrating it into several state-of-the-art frameworks for anomaly detection on image and video.
arXiv Detail & Related papers (2021-11-17T13:30:31Z)
Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos [76.37907640271806]
We propose an Image-video-joint polyp detection network (Ivy-Net) to address the domain gap between colonoscopy images from historical medical reports and real-time videos. Experiments on the collected dataset demonstrate that our Ivy-Net achieves the state-of-the-art result on colonoscopy video.
arXiv Detail & Related papers (2020-12-31T10:33:09Z)
Robust Unsupervised Video Anomaly Detection by Multi-Path Frame Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design. Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z)
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels. Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework. We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.