Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024
- URL: http://arxiv.org/abs/2406.00587v1
- Date: Sun, 2 Jun 2024 01:37:26 GMT
- Title: Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024
- Authors: Biao Wu, Diankai Zhang, Si Gao, Chengjian Zheng, Shaoli Liu, Ning Wang,
- Abstract summary: We adopt semi-supervised video semantic segmentation method based on unreliable pseudo labels.
Our method achieves the mIoU scores of 63.71% and 67.83% on development test and final test respectively.
We obtain the 1st place in the Video Scene Parsing in the Wild Challenge at CVPR 2024.
- Score: 12.274092278786966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Compared with image scene parsing, video scene parsing introduces temporal information, which can effectively improve the consistency and accuracy of prediction,because the real-world is actually video-based rather than a static state. In this paper, we adopt semi-supervised video semantic segmentation method based on unreliable pseudo labels. Then, We ensemble the teacher network model with the student network model to generate pseudo labels and retrain the student network. Our method achieves the mIoU scores of 63.71% and 67.83% on development test and final test respectively. Finally, we obtain the 1st place in the Video Scene Parsing in the Wild Challenge at CVPR 2024.
Related papers
- Unified Mask Embedding and Correspondence Learning for Self-Supervised
Video Segmentation [76.40565872257709]
We develop a unified framework which simultaneously models cross-frame dense correspondence for locally discriminative feature learning.
It is able to directly learn to perform mask-guided sequential segmentation from unlabeled videos.
Our algorithm sets state-of-the-arts on two standard benchmarks (i.e., DAVIS17 and YouTube-VOS)
arXiv Detail & Related papers (2023-03-17T16:23:36Z) - Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z) - Boundary-aware Self-supervised Learning for Video Scene Segmentation [20.713635723315527]
Video scene segmentation is a task of temporally localizing scene boundaries in a video.
We introduce three novel boundary-aware pretext tasks: Shot-Scene Matching, Contextual Group Matching and Pseudo-boundary Prediction.
We achieve the new state-of-the-art on the MovieNet-SSeg benchmark.
arXiv Detail & Related papers (2022-01-14T02:14:07Z) - Video-Text Pre-training with Learned Regions [59.30893505895156]
Video-Text pre-training aims at learning transferable representations from large-scale video-text pairs.
We propose a module for videotext-learning, RegionLearner, which can take into account the structure of objects during pre-training on large-scale video-text pairs.
arXiv Detail & Related papers (2021-12-02T13:06:53Z) - Learning To Segment Dominant Object Motion From Watching Videos [72.57852930273256]
We envision a simple framework for dominant moving object segmentation that neither requires annotated data to train nor relies on saliency priors or pre-trained optical flow maps.
Inspired by a layered image representation, we introduce a technique to group pixel regions according to their affine parametric motion.
This enables our network to learn segmentation of the dominant foreground object using only RGB image pairs as input for both training and inference.
arXiv Detail & Related papers (2021-11-28T14:51:00Z) - Memory Based Video Scene Parsing [25.452807436316167]
We introduce our solution for the 1st Video Scene Parsing in the Wild Challenge, which achieves a mIoU of 57.44 and obtained the 2nd place (our team name is CharlesBLWX)
arXiv Detail & Related papers (2021-09-01T13:18:36Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive
Learning [82.09856883441044]
Video understanding relies on perceiving the global content modeling its internal connections.
We propose a block-wise strategy where we mask neighboring video tokens in both spatial and temporal domains.
We also add an augmentation-free contrastive learning method to further capture global content.
arXiv Detail & Related papers (2021-06-21T16:48:19Z) - Contrastive Learning of Image Representations with Cross-Video
Cycle-Consistency [13.19476138523546]
Cross-video relation has barely been explored for visual representation learning.
We propose a novel contrastive learning method which explores the cross-video relation by using cycle-consistency for general image representation learning.
We show significant improvement over state-of-the-art contrastive learning methods.
arXiv Detail & Related papers (2021-05-13T17:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.