LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation
- URL: http://arxiv.org/abs/2409.05847v1
- Date: Mon, 9 Sep 2024 17:45:45 GMT
- Title: LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation
- Authors: Henghui Ding, Lingyi Hong, Chang Liu, Ning Xu, Linjie Yang, Yuchen Fan, Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Jinming Chai, Qin Ma, Junpei Zhang, Licheng Jiao, Fang Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, LingLing Li, Hao Fang, Feiyu Pan, Xiankai Lu, Wei Zhang, Runmin Cong, Tuyen Tran, Bin Cao, Yisi Zhang, Hanyi Wang, Xingjian He, Jing Liu,
- Abstract summary: This paper introduces the 6th Large-scale Video Object (LSVOS) challenge in conjunction with ECCV 2024 workshop.
This year's challenge includes two tasks: Video Object (VOS) and Referring Video Object (RVOS)
This year's challenge attracted 129 registered teams from more than 20 institutes across over 8 countries.
- Score: 124.50550604020684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes. In this paper, we introduce the 6th Large-scale Video Object Segmentation (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year's challenge includes two tasks: Video Object Segmentation (VOS) and Referring Video Object Segmentation (RVOS). In this year, we replace the classic YouTube-VOS and YouTube-RVOS benchmark with latest datasets MOSE, LVOS, and MeViS to assess VOS under more challenging complex environments. This year's challenge attracted 129 registered teams from more than 20 institutes across over 8 countries. This report include the challenge and dataset introduction, and the methods used by top 7 teams in two tracks. More details can be found in our homepage https://lsvos.github.io/.
Related papers
- E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding [57.630136434038384]
We introduce E.T. Bench (Event-Level & Time-Sensitive Video Understanding Benchmark), a large-scale benchmark for open-ended event-level video understanding.
We extensively evaluated 8 Image-LLMs and 12 Video-LLMs on our benchmark, and the results reveal that state-of-the-art models for coarse-level (video-level) understanding struggle to solve our fine-grained tasks.
Our simple but effective solution demonstrates superior performance in multiple scenarios.
arXiv Detail & Related papers (2024-09-26T17:53:04Z) - CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track [35.70400178294299]
We introduce the solution of our team "yuanjie" for video object segmentation in the 6-th LSVOS Challenge VOS Track at ECCV 2024.
We believe that our proposed CSS-Segment will perform better in videos of complex object motion and long-term presentation.
Our method achieved a J&F score of 80.84 in and test phases, and ranked 2nd in the 6-th LSVOS Challenge VOS Track at ECCV 2024.
arXiv Detail & Related papers (2024-08-24T13:47:56Z) - PVUW 2024 Challenge on Complex Video Understanding: Methods and Results [199.5593316907284]
We add two new tracks, Complex Video Object Track based on MOSE dataset and Motion Expression guided Video track based on MeViS dataset.
In the two new tracks, we provide additional videos and annotations that feature challenging elements.
These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes.
arXiv Detail & Related papers (2024-06-24T17:38:58Z) - 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [81.50620771207329]
We investigate the effectiveness of static-dominant data and frame sampling on referring video object segmentation (RVOS)
Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge.
arXiv Detail & Related papers (2024-06-11T08:05:26Z) - MOSE: A New Dataset for Video Object Segmentation in Complex Scenes [106.64327718262764]
Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence.
The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets.
We collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments.
arXiv Detail & Related papers (2023-02-03T17:20:03Z) - 5th Place Solution for YouTube-VOS Challenge 2022: Video Object
Segmentation [4.004851693068654]
Video object segmentation (VOS) has made significant progress with the rise of deep learning.
Similar objects are easily confused and tiny objects are difficult to find.
We propose a simple yet effective solution for this task.
arXiv Detail & Related papers (2022-06-20T06:14:27Z) - Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge [133.80567761430584]
We collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.
OVIS consists of 296k high-quality instance masks and 901 occluded scenes.
All baseline methods encounter a significant performance degradation of about 80% in the heavily occluded object group.
arXiv Detail & Related papers (2021-11-15T17:59:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.