Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations
- URL: http://arxiv.org/abs/2207.09664v1
- Date: Wed, 20 Jul 2022 05:42:19 GMT
- Title: Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations
- Authors: Yang Yu, Zixu Zhao, Yueming Jin, Guangyong Chen, Qi Dou and Pheng-Ann
Heng
- Abstract summary: We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
- Score: 72.15956198507281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical scene segmentation is fundamentally crucial for prompting cognitive
assistance in robotic surgery. However, pixel-wise annotating surgical video in
a frame-by-frame manner is expensive and time consuming. To greatly reduce the
labeling burden, in this work, we study semi-supervised scene segmentation from
robotic surgical video, which is practically essential yet rarely explored
before. We consider a clinically suitable annotation situation under the
equidistant sampling. We then propose PGV-CL, a novel pseudo-label guided
cross-video contrast learning method to boost scene segmentation. It
effectively leverages unlabeled data for a trusty and global model
regularization that produces more discriminative feature representation.
Concretely, for trusty representation learning, we propose to incorporate
pseudo labels to instruct the pair selection, obtaining more reliable
representation pairs for pixel contrast. Moreover, we expand the representation
learning space from previous image-level to cross-video, which can capture the
global semantics to benefit the learning process. We extensively evaluate our
method on a public robotic surgery dataset EndoVis18 and a public cataract
dataset CaDIS. Experimental results demonstrate the effectiveness of our
method, consistently outperforming the state-of-the-art semi-supervised methods
under different labeling ratios, and even surpassing fully supervised training
on EndoVis18 with 10.1% labeling.
Related papers
- Revisiting Surgical Instrument Segmentation Without Human Intervention: A Graph Partitioning View [7.594796294925481]
We propose an unsupervised method by reframing the video frame segmentation as a graph partitioning problem.
A self-supervised pre-trained model is firstly leveraged as a feature extractor to capture high-level semantic features.
On the "deep" eigenvectors, a surgical video frame is meaningfully segmented into different modules like tools and tissues, providing distinguishable semantic information.
arXiv Detail & Related papers (2024-08-27T05:31:30Z) - Dual-Decoder Consistency via Pseudo-Labels Guided Data Augmentation for
Semi-Supervised Medical Image Segmentation [13.707121013895929]
We present a novel semi-supervised learning method, Dual-Decoder Consistency via Pseudo-Labels Guided Data Augmentation.
We use distinct decoders for student and teacher networks while maintain the same encoder.
To learn from unlabeled data, we create pseudo-labels generated by the teacher networks and augment the training data with the pseudo-labels.
arXiv Detail & Related papers (2023-08-31T09:13:34Z) - Distilling Self-Supervised Vision Transformers for Weakly-Supervised
Few-Shot Classification & Segmentation [58.03255076119459]
We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT)
Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions.
Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings.
arXiv Detail & Related papers (2023-07-07T06:16:43Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Min-Max Similarity: A Contrastive Learning Based Semi-Supervised
Learning Network for Surgical Tools Segmentation [0.0]
We propose a semi-supervised segmentation network based on contrastive learning.
In contrast to the previous state-of-the-art, we introduce a contrastive learning form of dual-view training.
Our proposed method outperforms state-of-the-art semi-supervised and fully supervised segmentation algorithms consistently.
arXiv Detail & Related papers (2022-03-29T01:40:26Z) - Semi-supervised Contrastive Learning for Label-efficient Medical Image
Segmentation [11.935891325600952]
We propose a supervised local contrastive loss that leverages limited pixel-wise annotation to force pixels with the same label to gather around in the embedding space.
With different amounts of labeled data, our methods consistently outperform the state-of-the-art contrast-based methods and other semi-supervised learning techniques.
arXiv Detail & Related papers (2021-09-15T16:23:48Z) - Self-Ensembling Contrastive Learning for Semi-Supervised Medical Image
Segmentation [6.889911520730388]
We aim to boost the performance of semi-supervised learning for medical image segmentation with limited labels.
We learn latent representations directly at feature-level by imposing contrastive loss on unlabeled images.
We conduct experiments on an MRI and a CT segmentation dataset and demonstrate that the proposed method achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-05-27T03:27:58Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences
for Urban Scene Segmentation [57.68890534164427]
In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation.
We simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data.
Our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks.
arXiv Detail & Related papers (2020-05-20T18:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.