VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe
Inspection
- URL: http://arxiv.org/abs/2210.11158v1
- Date: Thu, 20 Oct 2022 10:52:49 GMT
- Title: VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe
Inspection
- Authors: Yi Liu, Xuan Zhang, Ying Li, Guixin Liang, Yabing Jiang, Lixia Qiu,
Haiping Tang, Fei Xie, Wei Yao, Yi Dai, Yu Qiao, Yali Wang
- Abstract summary: We introduce two high-quality video benchmarks, namely QV-Pipe and CCTV-Pipe, for anomaly inspection in the real-world urban pipe systems.
In this report, we describe the details of these benchmarks, the problem definitions of competition tracks, the evaluation metric, and the result summary.
- Score: 40.446994095055985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video understanding is an important problem in computer vision. Currently,
the well-studied task in this research is human action recognition, where the
clips are manually trimmed from the long videos, and a single class of human
action is assumed for each clip. However, we may face more complicated
scenarios in the industrial applications. For example, in the real-world urban
pipe system, anomaly defects are fine-grained, multi-labeled, domain-relevant.
To recognize them correctly, we need to understand the detailed video content.
For this reason, we propose to advance research areas of video understanding,
with a shift from traditional action recognition to industrial anomaly
analysis. In particular, we introduce two high-quality video benchmarks, namely
QV-Pipe and CCTV-Pipe, for anomaly inspection in the real-world urban pipe
systems. Based on these new datasets, we will host two competitions including
(1) Video Defect Classification on QV-Pipe and (2) Temporal Defect Localization
on CCTV-Pipe. In this report, we describe the details of these benchmarks, the
problem definitions of competition tracks, the evaluation metric, and the
result summary. We expect that, this competition would bring new opportunities
and challenges for video understanding in smart city and beyond. The details of
our VideoPipe challenge can be found in https://videopipe.github.io.
Related papers
- VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs [64.60035916955837]
VANE-Bench is a benchmark designed to assess the proficiency of Video-LMMs in detecting anomalies and inconsistencies in videos.
Our dataset comprises an array of videos synthetically generated using existing state-of-the-art text-to-video generation models.
We evaluate nine existing Video-LMMs, both open and closed sources, on this benchmarking task and find that most of the models encounter difficulties in effectively identifying the subtle anomalies.
arXiv Detail & Related papers (2024-06-14T17:59:01Z) - A Survey of Video Datasets for Grounded Event Understanding [34.11140286628736]
multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding.
We survey 105 video datasets that require event understanding capability.
arXiv Detail & Related papers (2024-06-14T00:36:55Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for
Real-time Soccer Commentary Generation [75.60413443783953]
We present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC)
Our data and code are available at https://github.com/THU-KEG/goal.
arXiv Detail & Related papers (2023-03-26T08:43:36Z) - Technical Report for CVPR 2022 LOVEU AQTC Challenge [3.614550981030065]
This report presents the 2nd winning model for AQTC, a task newly introduced in CVPR 2022 LOng-form VidEo Understanding (LOVEU) challenges.
This challenge faces difficulties with multi-step answers, multi-modal, and diverse and changing button representations in video.
We propose a new context ground module attention mechanism for more effective feature mapping.
arXiv Detail & Related papers (2022-06-29T12:07:43Z) - Fill-in-the-blank as a Challenging Video Understanding Evaluation
Framework [19.031957183047048]
We introduce a novel dataset consisting of 28,000 videos and fill-in-the-blank tests.
We show that both a multimodal model and a strong language model have a large gap with human performance.
arXiv Detail & Related papers (2021-04-09T04:00:10Z) - NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results [131.05847851975236]
This paper reviews the NTIRE 2020 challenge on video quality mapping (VQM)
The challenge includes both a supervised track (track 1) and a weakly-supervised track (track 2) for two benchmark datasets.
For track 1, in total 7 teams competed in the final test phase, demonstrating novel and effective solutions to the problem.
For track 2, some existing methods are evaluated, showing promising solutions to the weakly-supervised video quality mapping problem.
arXiv Detail & Related papers (2020-05-05T15:45:16Z) - VIOLIN: A Large-Scale Dataset for Video-and-Language Inference [103.7457132841367]
We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.
Given a video clip with subtitles aligned as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip.
A new large-scale dataset, named Violin (VIdeO-and-Language INference), is introduced for this task, which consists of 95,322 video-hypothesis pairs from 15,887 video clips.
arXiv Detail & Related papers (2020-03-25T20:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.