The Runner-up Solution for YouTube-VIS Long Video Challenge 2022
- URL: http://arxiv.org/abs/2211.09973v1
- Date: Fri, 18 Nov 2022 01:40:59 GMT
- Title: The Runner-up Solution for YouTube-VIS Long Video Challenge 2022
- Authors: Junfeng Wu, Yi Jiang, Qihao Liu, Xiang Bai, Song Bai
- Abstract summary: We adopt the previously proposed online video instance segmentation method IDOL for this challenge.
We use pseudo labels to further help contrastive learning, so as to obtain more temporally consistent instance embedding.
The proposed method obtains 40.2 AP on the YouTube-VIS 2022 long video dataset and was ranked second in this challenge.
- Score: 72.13080661144761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report describes our 2nd-place solution for the ECCV 2022
YouTube-VIS Long Video Challenge. We adopt the previously proposed online video
instance segmentation method IDOL for this challenge. In addition, we use
pseudo labels to further help contrastive learning, so as to obtain more
temporally consistent instance embedding to improve tracking performance
between frames. The proposed method obtains 40.2 AP on the YouTube-VIS 2022
long video dataset and was ranked second place in this challenge. We hope our
simple and effective method could benefit further research.
Related papers
- CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track [35.70400178294299]
We introduce the solution of our team "yuanjie" for video object segmentation in the 6-th LSVOS Challenge VOS Track at ECCV 2024.
We believe that our proposed CSS-Segment will perform better in videos of complex object motion and long-term presentation.
Our method achieved a J&F score of 80.84 in and test phases, and ranked 2nd in the 6-th LSVOS Challenge VOS Track at ECCV 2024.
arXiv Detail & Related papers (2024-08-24T13:47:56Z) - 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [81.50620771207329]
We investigate the effectiveness of static-dominant data and frame sampling on referring video object segmentation (RVOS)
Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge.
arXiv Detail & Related papers (2024-06-11T08:05:26Z) - A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step
Inference [51.26551806938455]
Affordance-centric Question-driven Task Completion (AQTC) for Egocentric Assistant introduces a groundbreaking scenario.
We present a solution for enhancing video alignment to improve multi-step inference.
Our method secured the 2nd place in CVPR'2023 AQTC challenge.
arXiv Detail & Related papers (2023-06-26T04:19:33Z) - 1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation [25.235404527487784]
Video panoptic segmentation is a challenging task that serves as the cornerstone of numerous downstream applications.
We believe that the decoupling strategy proposed by DVIS enables more effective utilization of temporal information for both "thing" and "stuff" objects.
Our method achieved a VPQ score of 51.4 and 53.7 in the development and test phases, respectively, and ranked 1st in the VPS track of the 2nd PVUW Challenge.
arXiv Detail & Related papers (2023-06-07T01:24:48Z) - 1st Place Solutions for the UVO Challenge 2022 [26.625850534861414]
The method ranks first on the 2nd Unidentified Video Objects (UVO) challenge, achieving AR@100 of 46.8, 64.7 and 32.2 in the limited data frame track, unlimited data frame track and video track respectively.
arXiv Detail & Related papers (2022-10-18T06:54:37Z) - AIM 2022 Challenge on Super-Resolution of Compressed Image and Video:
Dataset, Methods and Results [110.91485363392167]
This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022.
The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video.
arXiv Detail & Related papers (2022-08-23T20:32:38Z) - ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries
Challenge 2022 [61.81899056005645]
Given a video clip and a text query, the goal of this challenge is to locate a temporal moment of the video clip where the answer to the query can be obtained.
We propose a multi-scale cross-modal transformer and a video frame-level contrastive loss to fully uncover the correlation between language queries and video clips.
The experimental results demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2022-07-01T12:48:35Z) - NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Dataset
and Study [95.36629866768999]
This paper introduces a novel dataset for video enhancement and studies the state-of-the-art methods of the NTIRE 2021 challenge.
The challenge is the first NTIRE challenge in this direction, with three competitions, hundreds of participants and tens of proposed solutions.
We find that the NTIRE 2021 challenge advances the state-of-the-art of quality enhancement on compressed video.
arXiv Detail & Related papers (2021-04-21T22:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.