Correlation-aware active learning for surgery video segmentation
- URL: http://arxiv.org/abs/2311.08811v2
- Date: Mon, 11 Dec 2023 12:57:35 GMT
- Title: Correlation-aware active learning for surgery video segmentation
- Authors: Fei Wu, Pablo Marquez-Neila, Mingyi Zheng, Hedyeh Rafii-Tari, Raphael
Sznitman
- Abstract summary: This work proposes a novel AL strategy for surgery video segmentation, COWAL, COrrelation-aWare Active Learning.
Our approach involves projecting images into a latent space that has been fine-tuned using contrastive learning and then selecting a fixed number of representative images from local clusters of video frames.
We demonstrate the effectiveness of this approach on two video datasets of surgical instruments and three real-world video datasets.
- Score: 13.327429312047396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic segmentation is a complex task that relies heavily on large amounts
of annotated image data. However, annotating such data can be time-consuming
and resource-intensive, especially in the medical domain. Active Learning (AL)
is a popular approach that can help to reduce this burden by iteratively
selecting images for annotation to improve the model performance. In the case
of video data, it is important to consider the model uncertainty and the
temporal nature of the sequences when selecting images for annotation. This
work proposes a novel AL strategy for surgery video segmentation, COWAL,
COrrelation-aWare Active Learning. Our approach involves projecting images into
a latent space that has been fine-tuned using contrastive learning and then
selecting a fixed number of representative images from local clusters of video
frames. We demonstrate the effectiveness of this approach on two video datasets
of surgical instruments and three real-world video datasets. The datasets and
code will be made publicly available upon receiving necessary approvals.
Related papers
- Time Does Tell: Self-Supervised Time-Tuning of Dense Image
Representations [79.87044240860466]
We propose a novel approach that incorporates temporal consistency in dense self-supervised learning.
Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos.
Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images.
arXiv Detail & Related papers (2023-08-22T21:28:58Z) - EISeg: An Efficient Interactive Segmentation Tool based on PaddlePaddle [7.588694189597639]
We introduce EISeg, an Efficient Interactive SEGmentation annotation tool that can drastically improve image segmentation annotation efficiency.
We also provide various domain-specific models for remote sensing, medical imaging, industrial quality inspections, human segmentation, and temporal aware models for video segmentation.
arXiv Detail & Related papers (2022-10-17T07:12:13Z) - iBoot: Image-bootstrapped Self-Supervised Video Representation Learning [45.845595749486215]
Video self-supervised learning (SSL) suffers from added challenges: video datasets are typically not as large as image datasets.
We propose to utilize a strong image-based model, pre-trained with self- or language supervision, in a video representation learning framework.
The proposed algorithm is shown to learn much more efficiently in less epochs and with a smaller batch.
arXiv Detail & Related papers (2022-06-16T17:42:48Z) - Composable Augmentation Encoding for Video Representation Learning [94.2358972764708]
We focus on contrastive methods for self-supervised video representation learning.
A common paradigm in contrastive learning is to construct positive pairs by sampling different data views for the same instance, with different data instances as negatives.
We propose an 'augmentation aware' contrastive learning framework, where we explicitly provide a sequence of augmentation parameterisations.
We show that our method encodes valuable information about specified spatial or temporal augmentation, and in doing so also achieve state-of-the-art performance on a number of video benchmarks.
arXiv Detail & Related papers (2021-04-01T16:48:53Z) - Automatic Curation of Large-Scale Datasets for Audio-Visual
Representation Learning [62.47593143542552]
We describe a subset optimization approach for automatic dataset curation.
We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data, despite being automatically constructed, achieve similar downstream performances to existing video datasets with similar scales.
arXiv Detail & Related papers (2021-01-26T14:27:47Z) - Rethinking Interactive Image Segmentation: Feature Space Annotation [68.8204255655161]
We propose interactive and simultaneous segment annotation from multiple images guided by feature space projection.
We show that our approach can surpass the accuracy of state-of-the-art methods in foreground segmentation datasets.
arXiv Detail & Related papers (2021-01-12T10:13:35Z) - Deep Active Learning for Joint Classification & Segmentation with Weak
Annotator [22.271760669551817]
CNN visualization and interpretation methods, like class-activation maps (CAMs), are typically used to highlight the image regions linked to class predictions.
We propose an active learning framework, which progressively integrates pixel-level annotations during training.
Our results indicate that, by simply using random sample selection, the proposed approach can significantly outperform state-of-the-art CAMs and AL methods.
arXiv Detail & Related papers (2020-10-10T03:25:54Z) - Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences
for Urban Scene Segmentation [57.68890534164427]
In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation.
We simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data.
Our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks.
arXiv Detail & Related papers (2020-05-20T18:00:05Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.