Technical Report for CVPR 2022 LOVEU AQTC Challenge
- URL: http://arxiv.org/abs/2206.14555v1
- Date: Wed, 29 Jun 2022 12:07:43 GMT
- Title: Technical Report for CVPR 2022 LOVEU AQTC Challenge
- Authors: Hyeonyu Kim, Jongeun Kim, Jeonghun Kang, Sanguk Park, Dongchan Park
and Taehwan Kim
- Abstract summary: This report presents the 2nd winning model for AQTC, a task newly introduced in CVPR 2022 LOng-form VidEo Understanding (LOVEU) challenges.
This challenge faces difficulties with multi-step answers, multi-modal, and diverse and changing button representations in video.
We propose a new context ground module attention mechanism for more effective feature mapping.
- Score: 3.614550981030065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This technical report presents the 2nd winning model for AQTC, a task newly
introduced in CVPR 2022 LOng-form VidEo Understanding (LOVEU) challenges. This
challenge faces difficulties with multi-step answers, multi-modal, and diverse
and changing button representations in video. We address this problem by
proposing a new context ground module attention mechanism for more effective
feature mapping. In addition, we also perform the analysis over the number of
buttons and ablation study of different step networks and video features. As a
result, we achieved the overall 2nd place in LOVEU competition track 3,
specifically the 1st place in two out of four evaluation metrics. Our code is
available at https://github.com/jaykim9870/ CVPR-22_LOVEU_unipyler.
Related papers
- 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [81.50620771207329]
We investigate the effectiveness of static-dominant data and frame sampling on referring video object segmentation (RVOS)
Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge.
arXiv Detail & Related papers (2024-06-11T08:05:26Z) - The Runner-up Solution for YouTube-VIS Long Video Challenge 2022 [72.13080661144761]
We adopt the previously proposed online video instance segmentation method IDOL for this challenge.
We use pseudo labels to further help contrastive learning, so as to obtain more temporally consistent instance embedding.
The proposed method obtains 40.2 AP on the YouTube-VIS 2022 long video dataset and was ranked second in this challenge.
arXiv Detail & Related papers (2022-11-18T01:40:59Z) - VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe
Inspection [40.446994095055985]
We introduce two high-quality video benchmarks, namely QV-Pipe and CCTV-Pipe, for anomaly inspection in the real-world urban pipe systems.
In this report, we describe the details of these benchmarks, the problem definitions of competition tracks, the evaluation metric, and the result summary.
arXiv Detail & Related papers (2022-10-20T10:52:49Z) - Exploiting Feature Diversity for Make-up Temporal Video Grounding [15.358540603177547]
This report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022.
MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual description.
arXiv Detail & Related papers (2022-08-12T09:03:25Z) - ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries
Challenge 2022 [61.81899056005645]
Given a video clip and a text query, the goal of this challenge is to locate a temporal moment of the video clip where the answer to the query can be obtained.
We propose a multi-scale cross-modal transformer and a video frame-level contrastive loss to fully uncover the correlation between language queries and video clips.
The experimental results demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2022-07-01T12:48:35Z) - NTIRE 2022 Challenge on Perceptual Image Quality Assessment [90.04931572825859]
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA)
The challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms.
The winning method can demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2022-06-23T13:36:49Z) - Dense-Caption Matching and Frame-Selection Gating for Temporal
Localization in VideoQA [96.10612095576333]
We propose a video question answering model which effectively integrates multi-modal input sources and finds the temporally relevant information to answer questions.
Our model is also comprised of dual-level attention (word/object and frame level), multi-head self-cross-integration for different sources (video and dense captions), and which pass more relevant information to gates.
We evaluate our model on the challenging TVQA dataset, where each of our model components provides significant gains, and our overall model outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2020-05-13T16:35:27Z) - NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results [131.05847851975236]
This paper reviews the NTIRE 2020 challenge on video quality mapping (VQM)
The challenge includes both a supervised track (track 1) and a weakly-supervised track (track 2) for two benchmark datasets.
For track 1, in total 7 teams competed in the final test phase, demonstrating novel and effective solutions to the problem.
For track 2, some existing methods are evaluated, showing promising solutions to the weakly-supervised video quality mapping problem.
arXiv Detail & Related papers (2020-05-05T15:45:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.