Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
- URL: http://arxiv.org/abs/2411.19941v1
- Date: Fri, 29 Nov 2024 18:57:25 GMT
- Title: Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
- Authors: Joseph Heyward, João Carreira, Dima Damen, Andrew Zisserman, Viorica Pătrăucean,
- Abstract summary: We organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024.
The goal was to benchmarking state-of-the-art video models and measuring the progress since last year using the Perception Test benchmark.
This year, the challenge had seven tracks and covered low-level and high-level tasks, with language and non-language interfaces, across video, audio, and text modalities.
The additional track covered hour-long video understanding and introduced a novel video QA benchmark 1h-walk VQA.
- Score: 64.16672247204997
- License:
- Abstract: Following the successful 2023 edition, we organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024, with the goal of benchmarking state-of-the-art video models and measuring the progress since last year using the Perception Test benchmark. This year, the challenge had seven tracks (up from six last year) and covered low-level and high-level tasks, with language and non-language interfaces, across video, audio, and text modalities; the additional track covered hour-long video understanding and introduced a novel video QA benchmark 1h-walk VQA. Overall, the tasks in the different tracks were: object tracking, point tracking, temporal action localisation, temporal sound localisation, multiple-choice video question-answering, grounded video question-answering, and hour-long video question-answering. We summarise in this report the challenge tasks and results, and introduce in detail the novel hour-long video QA benchmark 1h-walk VQA.
Related papers
- HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding [52.696422425058245]
We build a large-scale hour-long long video benchmark, HLV-1K, designed to evaluate long video understanding models.
HLV-1K comprises 1009 hour-long videos with 14,847 high-quality question answering (QA) and multi-choice question asnwering (MCQA)
We evaluate our benchmark using existing state-of-the-art methods and demonstrate its value for testing deep long video understanding capabilities at different levels and for various tasks.
arXiv Detail & Related papers (2025-01-03T05:32:37Z) - AIM 2024 Challenge on Video Super-Resolution Quality Assessment: Methods and Results [76.64868221556145]
This paper presents the Video Super-Resolution (SR) Quality Assessment (QA) Challenge that was part of the Advances in Image Manipulation (AIM) workshop.
The task of this challenge was to develop an objective QA method for videos upscaled 2x and 4x by modern image- and video-SR algorithms.
The goal was to advance the state-of-the-art in SR QA, which had proven to be a challenging problem with limited applicability of traditional QA methods.
arXiv Detail & Related papers (2024-10-05T16:42:23Z) - Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting [15.161997580529075]
This paper explores the novel challenge of VideoQA within a continual learning framework.
We propose Collaborative Prompting (ColPro), which integrates specific question constraint prompting, knowledge acquisition prompting, and visual temporal awareness prompting.
Experimental results on the NExT-QA and DramaQA datasets show that ColPro achieves superior performance compared to existing approaches.
arXiv Detail & Related papers (2024-10-01T15:07:07Z) - AIM 2024 Challenge on Video Saliency Prediction: Methods and Results [105.09572982350532]
This paper reviews the Challenge on Video Saliency Prediction at AIM 2024.
The goal of the participants was to develop a method for predicting accurate saliency maps for the provided set of video sequences.
arXiv Detail & Related papers (2024-09-23T08:59:22Z) - First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge [4.075139470537149]
We present our first-place solution to the Multiple-choice Video Question Answering track of The Second Perception Test Challenge.
This competition posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content.
arXiv Detail & Related papers (2024-09-20T14:31:13Z) - NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results [216.73187673659675]
This paper reviews the NTIRE 2024 Challenge on Shortform Video Quality Assessment (S-UGC VQA)
The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing.
The purpose is to build new benchmarks and advance the development of S-UGC VQA.
arXiv Detail & Related papers (2024-04-17T12:26:13Z) - Perception Test 2023: A Summary of the First Challenge And Outcome [67.0525378209708]
The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023.
The goal was to benchmarking state-of-the-art video models on the recently proposed Perception Test benchmark.
We summarise in this report the task descriptions, metrics, baselines, and results.
arXiv Detail & Related papers (2023-12-20T15:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.