Related papers: 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

URL: http://arxiv.org/abs/2406.07043v1
Date: Tue, 11 Jun 2024 08:05:26 GMT
Title: 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation
Authors: Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng,
Abstract summary: We investigate the effectiveness of static-dominant data and frame sampling on referring video object segmentation (RVOS) Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge.
Score: 81.50620771207329
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motion Expression guided Video Segmentation (MeViS), as an emerging task, poses many new challenges to the field of referring video object segmentation (RVOS). In this technical report, we investigated and validated the effectiveness of static-dominant data and frame sampling on this challenging setting. Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge. The code is available at: https://github.com/Tapall-AI/MeViS_Track_Solution_2024.

Related papers

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild [164.8093566483583]
This report provides a comprehensive overview of the 4th Pixel-level Video Understanding in the Wild (PVUW) Challenge, held in conjunction with CVPR 2025. The challenge features two tracks: MOSE, which focuses on complex scene video object segmentation, and MeViS, which targets motion-guided, language-based video segmentation.
arXiv Detail & Related papers (2025-04-15T16:02:47Z)
AIM 2024 Sparse Neural Rendering Challenge: Methods and Results [64.19942455360068]
This paper reviews the challenge on Sparse Neural Rendering that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2024. The challenge aims at producing novel camera view synthesis of diverse scenes from sparse image observations. Participants are asked to optimise objective fidelity to the ground-truth images as measured via the Peak Signal-to-Noise Ratio (PSNR) metric.
arXiv Detail & Related papers (2024-09-23T14:17:40Z)
LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation [124.50550604020684]
This paper introduces the 6th Large-scale Video Object (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year's challenge includes two tasks: Video Object (VOS) and Referring Video Object (RVOS) This year's challenge attracted 129 registered teams from more than 20 institutes across over 8 countries.
arXiv Detail & Related papers (2024-09-09T17:45:45Z)
CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track [35.70400178294299]
We introduce the solution of our team "yuanjie" for video object segmentation in the 6-th LSVOS Challenge VOS Track at ECCV 2024. We believe that our proposed CSS-Segment will perform better in videos of complex object motion and long-term presentation. Our method achieved a J&F score of 80.84 in and test phases, and ranked 2nd in the 6-th LSVOS Challenge VOS Track at ECCV 2024.
arXiv Detail & Related papers (2024-08-24T13:47:56Z)
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results [199.5593316907284]
We add two new tracks, Complex Video Object Track based on MOSE dataset and Motion Expression guided Video track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes.
arXiv Detail & Related papers (2024-06-24T17:38:58Z)
2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [8.20168024462357]
Motion Expression guided Video is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions. We introduce mask information obtained from the video instance segmentation model as preliminary information for temporal enhancement and employ SAM for spatial refinement. Our method achieved a score of 49.92 J &F in the validation phase and 54.20 J &F in the test phase, securing the final ranking of 2nd in the MeViS Track at the CVPR 2024 PVUW Challenge.
arXiv Detail & Related papers (2024-06-20T02:16:23Z)
2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation [12.274092278786966]
Video Panoptic (VPS) aims to simultaneously classify, track, segment all objects in a video. We propose a robust integrated video panoptic segmentation solution. Our method achieves state-of-the-art performance with a VPQ score of 56.36 and 57.12 in the development and test phases.
arXiv Detail & Related papers (2024-06-01T17:03:16Z)
1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation [25.587080499097425]
We present further improvements to the SOTA VIS method, DVIS. We introduce a denoising training strategy for the trainable tracker, allowing it to achieve more stable and accurate object tracking in complex and long videos. Our method achieves 57.9 AP and 56.0 AP in the development and test phases, respectively, and ranked 1st in the VIS track of the 5th LSVOS Challenge.
arXiv Detail & Related papers (2023-08-28T08:15:43Z)
1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation [25.235404527487784]
Video panoptic segmentation is a challenging task that serves as the cornerstone of numerous downstream applications. We believe that the decoupling strategy proposed by DVIS enables more effective utilization of temporal information for both "thing" and "stuff" objects. Our method achieved a VPQ score of 51.4 and 53.7 in the development and test phases, respectively, and ranked 1st in the VPS track of the 2nd PVUW Challenge.
arXiv Detail & Related papers (2023-06-07T01:24:48Z)
The Runner-up Solution for YouTube-VIS Long Video Challenge 2022 [72.13080661144761]
We adopt the previously proposed online video instance segmentation method IDOL for this challenge. We use pseudo labels to further help contrastive learning, so as to obtain more temporally consistent instance embedding. The proposed method obtains 40.2 AP on the YouTube-VIS 2022 long video dataset and was ranked second in this challenge.
arXiv Detail & Related papers (2022-11-18T01:40:59Z)
AIM 2020 Challenge on Video Temporal Super-Resolution [118.46127362093135]
Second AIM challenge on Video Temporal Super-Resolution (VTSR) This paper reports the second AIM challenge on Video Temporal Super-Resolution (VTSR)
arXiv Detail & Related papers (2020-09-28T00:10:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.