Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge
- URL: http://arxiv.org/abs/2111.07950v1
- Date: Mon, 15 Nov 2021 17:59:03 GMT
- Title: Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge
- Authors: Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai,
Serge Belongie, Alan Yuille, Philip H.S. Torr, Song Bai
- Abstract summary: We collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.
OVIS consists of 296k high-quality instance masks and 901 occluded scenes.
All baseline methods encounter a significant performance degradation of about 80% in the heavily occluded object group.
- Score: 133.80567761430584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep learning methods have achieved advanced video object
recognition performance in recent years, perceiving heavily occluded objects in
a video is still a very challenging task. To promote the development of
occlusion understanding, we collect a large-scale dataset called OVIS for video
instance segmentation in the occluded scenario. OVIS consists of 296k
high-quality instance masks and 901 occluded scenes. While our human vision
systems can perceive those occluded objects by contextual reasoning and
association, our experiments suggest that current video understanding systems
cannot. On the OVIS dataset, all baseline methods encounter a significant
performance degradation of about 80% in the heavily occluded object group,
which demonstrates that there is still a long way to go in understanding
obscured objects and videos in a complex real-world scenario. To facilitate the
research on new paradigms for video understanding systems, we launched a
challenge based on the OVIS dataset. The submitted top-performing algorithms
have achieved much higher performance than our baselines. In this paper, we
will introduce the OVIS dataset and further dissect it by analyzing the results
of baselines and submitted methods. The OVIS dataset and challenge information
can be found at http://songbai.site/ovis .
Related papers
- VISA: Reasoning Video Object Segmentation via Large Language Models [64.33167989521357]
We introduce a new task, Reasoning Video Object (ReasonVOS)
This task aims to generate a sequence of segmentation masks in response to implicit text queries that require complex reasoning abilities.
We introduce VISA (Video-based large language Instructed Assistant) to tackle ReasonVOS.
arXiv Detail & Related papers (2024-07-16T02:29:29Z) - Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset.
We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them.
Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z) - Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention [29.62044843067169]
Video object segmentation is a fundamental research problem in computer vision.
We propose a new method for self-supervised video object segmentation based on distillation learning of deformable attention.
arXiv Detail & Related papers (2024-01-25T04:39:48Z) - MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence.
The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets.
We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z) - Occluded Video Instance Segmentation [133.80567761430584]
We collect a large scale dataset called OVIS for occluded video instance segmentation.
OVIS consists of 296k high-quality instance masks from 25 semantic categories.
The highest AP achieved by state-of-the-art algorithms is only 14.4, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.
arXiv Detail & Related papers (2021-02-02T15:35:43Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.