DSText V2: A Comprehensive Video Text Spotting Dataset for Dense and
Small Text
- URL: http://arxiv.org/abs/2312.01938v1
- Date: Wed, 29 Nov 2023 09:13:27 GMT
- Title: DSText V2: A Comprehensive Video Text Spotting Dataset for Dense and
Small Text
- Authors: Weijia Wu, Yiming Zhang, Yefei He, Luoming Zhang, Zhenyu Lou, Hong
Zhou, and Xiang Bai
- Abstract summary: We establish a video text reading benchmark, named DSText V2, which focuses on Dense and Small text reading challenges in the video.
Compared with the previous datasets, the proposed dataset mainly include three new challenges.
High-proportioned small texts, coupled with the blurriness and distortion in the video, will bring further challenges.
- Score: 46.177941541282756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, video text detection, tracking, and recognition in natural scenes
are becoming very popular in the computer vision community. However, most
existing algorithms and benchmarks focus on common text cases (e.g., normal
size, density) and single scenario, while ignoring extreme video text
challenges, i.e., dense and small text in various scenarios. In this paper, we
establish a video text reading benchmark, named DSText V2, which focuses on
Dense and Small text reading challenges in the video with various scenarios.
Compared with the previous datasets, the proposed dataset mainly include three
new challenges: 1) Dense video texts, a new challenge for video text spotters
to track and read. 2) High-proportioned small texts, coupled with the
blurriness and distortion in the video, will bring further challenges. 3)
Various new scenarios, e.g., Game, Sports, etc. The proposed DSText V2 includes
140 video clips from 7 open scenarios, supporting three tasks, i.e., video text
detection (Task 1), video text tracking (Task 2), and end-to-end video text
spotting (Task 3). In this article, we describe detailed statistical
information of the dataset, tasks, evaluation protocols, and the results
summaries. Most importantly, a thorough investigation and analysis targeting
three unique challenges derived from our dataset are provided, aiming to
provide new insights. Moreover, we hope the benchmark will promise video text
research in the community. DSText v2 is built upon DSText v1, which was
previously introduced to organize the ICDAR 2023 competition for dense and
small video text.
Related papers
- Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension.
We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images.
We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z) - Composed Video Retrieval via Enriched Context and Discriminative Embeddings [118.66322242183249]
Composed video retrieval (CoVR) is a challenging problem in computer vision.
We introduce a novel CoVR framework that leverages detailed language descriptions to explicitly encode query-specific contextual information.
Our approach achieves gains as high as around 7% in terms of recall@K=1 score.
arXiv Detail & Related papers (2024-03-25T17:59:03Z) - A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension [49.74647080936875]
We introduce a large-scale and cross-modal Video Retrieval dataset with text reading comprehension, TextVR.
The proposed TextVR requires one unified cross-modal model to recognize and comprehend texts, relate them to the visual context, and decide what text semantic information is vital for the video retrieval task.
arXiv Detail & Related papers (2023-05-05T08:00:14Z) - ICDAR 2023 Video Text Reading Competition for Dense and Small Text [61.138557702185274]
We establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video.
Compared with the previous datasets, the proposed dataset mainly include three new challenges.
The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)
arXiv Detail & Related papers (2023-04-10T04:20:34Z) - A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text
Spotter with Transformer [12.167938646139705]
We introduce a large-scale, Bilingual, Open World Video text benchmark dataset(BOVText)
Firstly, we provide 2,000+ videos with more than 1,750,000+ frames, 25 times larger than the existing largest dataset with incidental text in videos.
Secondly, our dataset covers 30+ open categories with a wide selection of various scenarios, e.g., Life Vlog, Driving, Movie, etc.
arXiv Detail & Related papers (2021-12-09T13:21:26Z) - Bridging Vision and Language from the Video-to-Text Perspective: A
Comprehensive Review [1.0520692160489133]
This review categorizes and describes the state-of-the-art techniques for the video-to-text problem.
It covers the main video-to-text methods and the ways to evaluate their performance.
State-of-the-art techniques are still a long way from achieving human-like performance in generating or retrieving video descriptions.
arXiv Detail & Related papers (2021-03-27T02:12:28Z) - RoadText-1K: Text Detection & Recognition Dataset for Driving Videos [26.614671477004375]
This paper introduces a new "RoadText-1K" dataset for text in driving videos.
The dataset is 20 times larger than the existing largest dataset for text in videos.
arXiv Detail & Related papers (2020-05-19T14:51:25Z) - Text Synopsis Generation for Egocentric Videos [72.52130695707008]
We propose to generate a textual synopsis, consisting of a few sentences describing the most important events in a long egocentric videos.
Users can read the short text to gain insight about the video, and more importantly, efficiently search through the content of a large video database.
arXiv Detail & Related papers (2020-05-08T00:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.