Memory Based Video Scene Parsing
- URL: http://arxiv.org/abs/2109.00373v1
- Date: Wed, 1 Sep 2021 13:18:36 GMT
- Title: Memory Based Video Scene Parsing
- Authors: Zhenchao Jin, Dongdong Yu, Kai Su, Zehuan Yuan, Changhu Wang
- Abstract summary: We introduce our solution for the 1st Video Scene Parsing in the Wild Challenge, which achieves a mIoU of 57.44 and obtained the 2nd place (our team name is CharlesBLWX)
- Score: 25.452807436316167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video scene parsing is a long-standing challenging task in computer vision,
aiming to assign pre-defined semantic labels to pixels of all frames in a given
video. Compared with image semantic segmentation, this task pays more attention
on studying how to adopt the temporal information to obtain higher predictive
accuracy. In this report, we introduce our solution for the 1st Video Scene
Parsing in the Wild Challenge, which achieves a mIoU of 57.44 and obtained the
2nd place (our team name is CharlesBLWX).
Related papers
- 2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [8.20168024462357]
Motion Expression guided Video is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions.
We introduce mask information obtained from the video instance segmentation model as preliminary information for temporal enhancement and employ SAM for spatial refinement.
Our method achieved a score of 49.92 J &F in the validation phase and 54.20 J &F in the test phase, securing the final ranking of 2nd in the MeViS Track at the CVPR 2024 PVUW Challenge.
arXiv Detail & Related papers (2024-06-20T02:16:23Z) - Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024 [12.274092278786966]
We adopt semi-supervised video semantic segmentation method based on unreliable pseudo labels.
Our method achieves the mIoU scores of 63.71% and 67.83% on development test and final test respectively.
We obtain the 1st place in the Video Scene Parsing in the Wild Challenge at CVPR 2024.
arXiv Detail & Related papers (2024-06-02T01:37:26Z) - 2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation [12.274092278786966]
Video Panoptic (VPS) aims to simultaneously classify, track, segment all objects in a video.
We propose a robust integrated video panoptic segmentation solution.
Our method achieves state-of-the-art performance with a VPQ score of 56.36 and 57.12 in the development and test phases.
arXiv Detail & Related papers (2024-06-01T17:03:16Z) - Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video
Grounding [59.599378814835205]
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query.
We introduce a novel AMDA method to adaptively adjust the model's scene-related knowledge by incorporating insights from the target data.
arXiv Detail & Related papers (2023-12-21T07:49:27Z) - HierVL: Learning Hierarchical Video-Language Embeddings [108.77600799637172]
HierVL is a novel hierarchical video-language embedding that simultaneously accounts for both long-term and short-term associations.
We introduce a hierarchical contrastive training objective that encourages text-visual alignment at both the clip level and video level.
Our hierarchical scheme yields a clip representation that outperforms its single-level counterpart as well as a long-term video representation that achieves SotA.
arXiv Detail & Related papers (2023-01-05T21:53:19Z) - Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022 [93.98605636451806]
This report describes the SViT approach for the Ego4D Point of No Return (PNR) Temporal Localization Challenge.
We propose a learning framework which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model.
SViT obtains strong performance on the challenge test set with 0.656 absolute temporal localization error.
arXiv Detail & Related papers (2022-06-15T17:36:38Z) - End-to-end Dense Video Captioning as Sequence Generation [83.90502354328679]
We show how to model the two subtasks of dense video captioning jointly as one sequence generation task.
Experiments on YouCook2 and ViTT show encouraging results and indicate the feasibility of training complex tasks integrated into large-scale pre-trained models.
arXiv Detail & Related papers (2022-04-18T01:30:54Z) - Semantic Segmentation on VSPW Dataset through Aggregation of Transformer
Models [10.478712332545854]
This report introduces the solutions of team 'BetterThing' for the ICCV2021 - Video Scene Parsing in the Wild Challenge.
Transformer is used as the backbone for extracting video frame features, and the final result is the aggregation of the output of two Transformer models, SWIN and VOLO.
This solution achieves 57.3% mIoU, which is ranked 3rd place in the Video Scene Parsing in the Wild Challenge.
arXiv Detail & Related papers (2021-09-03T05:20:08Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - Video Panoptic Segmentation [117.08520543864054]
We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
arXiv Detail & Related papers (2020-06-19T19:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.