Related papers: VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection

URL: http://arxiv.org/abs/2210.11158v1
Date: Thu, 20 Oct 2022 10:52:49 GMT
Title: VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection
Authors: Yi Liu, Xuan Zhang, Ying Li, Guixin Liang, Yabing Jiang, Lixia Qiu, Haiping Tang, Fei Xie, Wei Yao, Yi Dai, Yu Qiao, Yali Wang
Abstract summary: We introduce two high-quality video benchmarks, namely QV-Pipe and CCTV-Pipe, for anomaly inspection in the real-world urban pipe systems. In this report, we describe the details of these benchmarks, the problem definitions of competition tracks, the evaluation metric, and the result summary.
Score: 40.446994095055985
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video understanding is an important problem in computer vision. Currently, the well-studied task in this research is human action recognition, where the clips are manually trimmed from the long videos, and a single class of human action is assumed for each clip. However, we may face more complicated scenarios in the industrial applications. For example, in the real-world urban pipe system, anomaly defects are fine-grained, multi-labeled, domain-relevant. To recognize them correctly, we need to understand the detailed video content. For this reason, we propose to advance research areas of video understanding, with a shift from traditional action recognition to industrial anomaly analysis. In particular, we introduce two high-quality video benchmarks, namely QV-Pipe and CCTV-Pipe, for anomaly inspection in the real-world urban pipe systems. Based on these new datasets, we will host two competitions including (1) Video Defect Classification on QV-Pipe and (2) Temporal Defect Localization on CCTV-Pipe. In this report, we describe the details of these benchmarks, the problem definitions of competition tracks, the evaluation metric, and the result summary. We expect that, this competition would bring new opportunities and challenges for video understanding in smart city and beyond. The details of our VideoPipe challenge can be found in https://videopipe.github.io.

Related papers

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild [164.8093566483583]
This report provides a comprehensive overview of the 4th Pixel-level Video Understanding in the Wild (PVUW) Challenge, held in conjunction with CVPR 2025. The challenge features two tracks: MOSE, which focuses on complex scene video object segmentation, and MeViS, which targets motion-guided, language-based video segmentation.
arXiv Detail & Related papers (2025-04-15T16:02:47Z)
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark [64.16672247204997]
We organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024. The goal was to benchmarking state-of-the-art video models and measuring the progress since last year using the Perception Test benchmark. This year, the challenge had seven tracks and covered low-level and high-level tasks, with language and non-language interfaces, across video, audio, and text modalities. The additional track covered hour-long video understanding and introduced a novel video QA benchmark 1h-walk VQA.
arXiv Detail & Related papers (2024-11-29T18:57:25Z)
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos. Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models. We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z)
A Survey of Video Datasets for Grounded Event Understanding [34.11140286628736]
multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding. We survey 105 video datasets that require event understanding capability.
arXiv Detail & Related papers (2024-06-14T00:36:55Z)
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications. Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities. We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z)
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation [75.60413443783953]
We present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC) Our data and code are available at https://github.com/THU-KEG/goal.
arXiv Detail & Related papers (2023-03-26T08:43:36Z)
Technical Report for CVPR 2022 LOVEU AQTC Challenge [3.614550981030065]
This report presents the 2nd winning model for AQTC, a task newly introduced in CVPR 2022 LOng-form VidEo Understanding (LOVEU) challenges. This challenge faces difficulties with multi-step answers, multi-modal, and diverse and changing button representations in video. We propose a new context ground module attention mechanism for more effective feature mapping.
arXiv Detail & Related papers (2022-06-29T12:07:43Z)
Fill-in-the-blank as a Challenging Video Understanding Evaluation Framework [19.031957183047048]
We introduce a novel dataset consisting of 28,000 videos and fill-in-the-blank tests. We show that both a multimodal model and a strong language model have a large gap with human performance.
arXiv Detail & Related papers (2021-04-09T04:00:10Z)
NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results [131.05847851975236]
This paper reviews the NTIRE 2020 challenge on video quality mapping (VQM) The challenge includes both a supervised track (track 1) and a weakly-supervised track (track 2) for two benchmark datasets. For track 1, in total 7 teams competed in the final test phase, demonstrating novel and effective solutions to the problem. For track 2, some existing methods are evaluated, showing promising solutions to the weakly-supervised video quality mapping problem.
arXiv Detail & Related papers (2020-05-05T15:45:16Z)
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference [103.7457132841367]
We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text. Given a video clip with subtitles aligned as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip. A new large-scale dataset, named Violin (VIdeO-and-Language INference), is introduced for this task, which consists of 95,322 video-hypothesis pairs from 15,887 video clips.
arXiv Detail & Related papers (2020-03-25T20:39:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.