VIDI: A Video Dataset of Incidents
- URL: http://arxiv.org/abs/2205.13277v1
- Date: Thu, 26 May 2022 11:30:59 GMT
- Title: VIDI: A Video Dataset of Incidents
- Authors: Duygu Sesver, Alp Eren Gen\c{c}o\u{g}lu, \c{C}a\u{g}r{\i} Emre
Y{\i}ld{\i}z, Zehra G\"unindi, Faeze Habibi, Ziya Ata Yaz{\i}c{\i}, Haz{\i}m
Kemal Ekenel
- Abstract summary: We present a video dataset, Video dataset of Incidents, VIDI, that contains 4,534 video clips corresponding to 43 incident categories.
To increase diversity, the videos have been searched in several languages.
We have shown that the recent methods improve the incident classification accuracy.
- Score: 5.002873541686896
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Automatic detection of natural disasters and incidents has become more
important as a tool for fast response. There have been many studies to detect
incidents using still images and text. However, the number of approaches that
exploit temporal information is rather limited. One of the main reasons for
this is that a diverse video dataset with various incident types does not
exist. To address this need, in this paper we present a video dataset, Video
Dataset of Incidents, VIDI, that contains 4,534 video clips corresponding to 43
incident categories. Each incident class has around 100 videos with a duration
of ten seconds on average. To increase diversity, the videos have been searched
in several languages. To assess the performance of the recent state-of-the-art
approaches, Vision Transformer and TimeSformer, as well as to explore the
contribution of video-based information for incident classification, we
performed benchmark experiments on the VIDI and Incidents Dataset. We have
shown that the recent methods improve the incident classification accuracy. We
have found that employing video data is very beneficial for the task. By using
the video data, the top-1 accuracy is increased to 76.56% from 67.37%, which
was obtained using a single frame. VIDI will be made publicly available.
Additional materials can be found at the following link:
https://github.com/vididataset/VIDI.
Related papers
- A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles [0.0]
This paper presents a dataset that contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites between January 1, 2024, and May 31, 2024.
YouTube and TikTok account for 48.6% and 15.2% of the videos, respectively.
For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset.
arXiv Detail & Related papers (2024-06-11T20:14:22Z) - Vript: A Video Is Worth Thousands of Words [54.815686588378156]
Vript is an annotated corpus of 12K high-resolution videos, offering detailed, dense, and script-like captions for over 420K clips.
Each clip has a caption of 145 words, which is over 10x longer than most video-text datasets.
Vript is a powerful model capable of end-to-end generation of dense and detailed captions for long videos.
arXiv Detail & Related papers (2024-06-10T06:17:55Z) - Towards Video Anomaly Retrieval from Video Anomaly Detection: New
Benchmarks and Model [70.97446870672069]
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications.
Video Anomaly Retrieval ( VAR) aims to pragmatically retrieve relevant anomalous videos by cross-modalities.
We present two benchmarks, UCFCrime-AR and XD-Violence, constructed on top of prevalent anomaly datasets.
arXiv Detail & Related papers (2023-07-24T06:22:37Z) - Overlooked Video Classification in Weakly Supervised Video Anomaly
Detection [4.162019309587633]
We study explicitly the power of video classification supervision using a BERT or LSTM.
With this BERT or LSTM, CNN features of all snippets of a video can be aggregated into a single feature which can be used for video classification.
This simple yet powerful video classification supervision, combined into the MIL framework, brings extraordinary performance improvement on all three major video anomaly detection datasets.
arXiv Detail & Related papers (2022-10-13T03:00:22Z) - Anomaly detection in surveillance videos using transformer based
attention model [3.2968779106235586]
This research suggests using a weakly supervised strategy to avoid annotating anomalous segments in training videos.
The proposed framework is validated on real-world dataset i.e. ShanghaiTech Campus dataset.
arXiv Detail & Related papers (2022-06-03T12:19:39Z) - VPN: Video Provenance Network for Robust Content Attribution [72.12494245048504]
We present VPN - a content attribution method for recovering provenance information from videos shared online.
We learn a robust search embedding for matching such video, using full-length or truncated video queries.
Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user.
arXiv Detail & Related papers (2021-09-21T09:07:05Z) - QVHighlights: Detecting Moments and Highlights in Videos via Natural
Language Queries [89.24431389933703]
We present the Query-based Video Highlights (QVHighlights) dataset.
It consists of over 10,000 YouTube videos, covering a wide range of topics.
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips.
arXiv Detail & Related papers (2021-07-20T16:42:58Z) - A new Video Synopsis Based Approach Using Stereo Camera [0.5801044612920815]
A new method for anomaly detection with object-based unsupervised learning has been developed.
By using this method, the video data is processed as pixels and the result is produced as a video segment.
The model we developed has been tested and verified separately for single camera and dual camera systems.
arXiv Detail & Related papers (2021-06-23T12:57:47Z) - VALUE: A Multi-Task Benchmark for Video-and-Language Understanding
Evaluation [124.02278735049235]
VALUE benchmark aims to cover a broad range of video genres, video lengths, data volumes, and task difficulty levels.
We evaluate various baseline methods with and without large-scale VidL pre-training.
The significant gap between our best model and human performance calls for future study for advanced VidL models.
arXiv Detail & Related papers (2021-06-08T18:34:21Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.