5th Place Solution for YouTube-VOS Challenge 2022: Video Object
Segmentation
- URL: http://arxiv.org/abs/2206.09585v1
- Date: Mon, 20 Jun 2022 06:14:27 GMT
- Title: 5th Place Solution for YouTube-VOS Challenge 2022: Video Object
Segmentation
- Authors: Wangwang Yang, Jinming Su, Yiting Duan, Tingyi Guo and Junfeng Luo
- Abstract summary: Video object segmentation (VOS) has made significant progress with the rise of deep learning.
Similar objects are easily confused and tiny objects are difficult to find.
We propose a simple yet effective solution for this task.
- Score: 4.004851693068654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video object segmentation (VOS) has made significant progress with the rise
of deep learning. However, there still exist some thorny problems, for example,
similar objects are easily confused and tiny objects are difficult to be found.
To solve these problems and further improve the performance of VOS, we propose
a simple yet effective solution for this task. In the solution, we first
analyze the distribution of the Youtube-VOS dataset and supplement the dataset
by introducing public static and video segmentation datasets. Then, we improve
three network architectures with different characteristics and train several
networks to learn the different characteristics of objects in videos. After
that, we use a simple way to integrate all results to ensure that different
models complement each other. Finally, subtle post-processing is carried out to
ensure accurate video object segmentation with precise boundaries. Extensive
experiments on Youtube-VOS dataset show that the proposed solution achieves the
state-of-the-art performance with an 86.1% overall score on the YouTube-VOS
2022 test set, which is 5th place on the video object segmentation track of the
Youtube-VOS Challenge 2022.
Related papers
- 3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation [63.199793919573295]
Video Object (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames.
Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance.
arXiv Detail & Related papers (2024-06-06T00:56:25Z) - 1st Place Solution for 5th LSVOS Challenge: Referring Video Object
Segmentation [65.45702890457046]
We integrate strengths of leading RVOS models to build up an effective paradigm.
To improve the consistency and quality of masks, we propose Two-Stage Multi-Model Fusion strategy.
Our method achieves 75.7% J&F on Ref-Youtube-VOS validation set and 70% J&F on test set, which ranks 1st place on 5th Large-scale Video Object Challenge (ICCV 2023) track 3.
arXiv Detail & Related papers (2024-01-01T04:24:48Z) - MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence.
The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets.
We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z) - 1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object
Segmentation [12.100628128028385]
We improve one-stage method ReferFormer to obtain mask sequences strongly correlated with language descriptions.
We leverage the superior performance of video object segmentation model to further enhance the quality and temporal consistency of the mask results.
Our single model reaches 70.3 J &F on the Referring Youtube-VOS validation set and 63.0 on the test set, ranking 1st place on CVPR2022 Referring Youtube-VOS challenge.
arXiv Detail & Related papers (2022-12-27T09:22:45Z) - Breaking the "Object" in Video Object Segmentation [36.20167854011788]
We present a dataset for Video Object under Transformations (VOST)
It consists of more than 700 high-resolution videos, captured in diverse environments, which are 21 seconds long average and densely labeled with masks instance.
A careful, multi-step approach is adopted to ensure that these videos focus on complex object transformations, capturing their full temporal extent.
We show that existing methods struggle when applied to this novel task and that their main limitation lies in over-reliance on static appearance cues.
arXiv Detail & Related papers (2022-12-12T19:22:17Z) - Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge [133.80567761430584]
We collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.
OVIS consists of 296k high-quality instance masks and 901 occluded scenes.
All baseline methods encounter a significant performance degradation of about 80% in the heavily occluded object group.
arXiv Detail & Related papers (2021-11-15T17:59:03Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - Learning Video Object Segmentation from Unlabeled Videos [158.18207922363783]
We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos.
We introduce a unified unsupervised/weakly supervised learning framework, called MuG, that comprehensively captures properties of VOS at multiple granularities.
arXiv Detail & Related papers (2020-03-10T22:12:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.