ICDAR 2023 Video Text Reading Competition for Dense and Small Text
- URL: http://arxiv.org/abs/2304.04376v1
- Date: Mon, 10 Apr 2023 04:20:34 GMT
- Title: ICDAR 2023 Video Text Reading Competition for Dense and Small Text
- Authors: Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou,
Umapada Pal, Dimosthenis Karatzas, Xiang Bai
- Abstract summary: We establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video.
Compared with the previous datasets, the proposed dataset mainly include three new challenges.
The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)
- Score: 61.138557702185274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, video text detection, tracking, and recognition in natural scenes
are becoming very popular in the computer vision community. However, most
existing algorithms and benchmarks focus on common text cases (e.g., normal
size, density) and single scenarios, while ignoring extreme video text
challenges, i.e., dense and small text in various scenarios. In this
competition report, we establish a video text reading benchmark, DSText, which
focuses on dense and small text reading challenges in the video with various
scenarios. Compared with the previous datasets, the proposed dataset mainly
include three new challenges: 1) Dense video texts, a new challenge for video
text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g.,
Game, sports, etc. The proposed DSText includes 100 video clips from 12 open
scenarios, supporting two tasks (i.e., video text tracking (Task 1) and
end-to-end video text spotting (Task 2)). During the competition period (opened
on 15th February 2023 and closed on 20th March 2023), a total of 24 teams
participated in the three proposed tasks with around 30 valid submissions,
respectively. In this article, we describe detailed statistical information of
the dataset, tasks, evaluation protocols and the results summaries of the ICDAR
2023 on DSText competition. Moreover, we hope the benchmark will promise video
text research in the community.
Related papers
- Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension.
We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images.
We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z) - DSText V2: A Comprehensive Video Text Spotting Dataset for Dense and
Small Text [46.177941541282756]
We establish a video text reading benchmark, named DSText V2, which focuses on Dense and Small text reading challenges in the video.
Compared with the previous datasets, the proposed dataset mainly include three new challenges.
High-proportioned small texts, coupled with the blurriness and distortion in the video, will bring further challenges.
arXiv Detail & Related papers (2023-11-29T09:13:27Z) - A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension [49.74647080936875]
We introduce a large-scale and cross-modal Video Retrieval dataset with text reading comprehension, TextVR.
The proposed TextVR requires one unified cross-modal model to recognize and comprehend texts, relate them to the visual context, and decide what text semantic information is vital for the video retrieval task.
arXiv Detail & Related papers (2023-05-05T08:00:14Z) - Real-time End-to-End Video Text Spotter with Contrastive Representation
Learning [91.15406440999939]
We propose a real-time end-to-end video text spotter with Contrastive Representation learning (CoText)
CoText simultaneously address the three tasks (e.g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.
A simple, lightweight architecture is designed for effective and accurate performance.
arXiv Detail & Related papers (2022-07-18T07:54:17Z) - A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text
Spotter with Transformer [12.167938646139705]
We introduce a large-scale, Bilingual, Open World Video text benchmark dataset(BOVText)
Firstly, we provide 2,000+ videos with more than 1,750,000+ frames, 25 times larger than the existing largest dataset with incidental text in videos.
Secondly, our dataset covers 30+ open categories with a wide selection of various scenarios, e.g., Life Vlog, Driving, Movie, etc.
arXiv Detail & Related papers (2021-12-09T13:21:26Z) - ICDAR 2021 Competition on Scene Video Text Spotting [28.439390836950025]
Scene video text spotting (SVTS) is a very important research topic because of many real-life applications.
This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2021 on SVTS competition.
arXiv Detail & Related papers (2021-07-26T01:25:57Z) - RoadText-1K: Text Detection & Recognition Dataset for Driving Videos [26.614671477004375]
This paper introduces a new "RoadText-1K" dataset for text in driving videos.
The dataset is 20 times larger than the existing largest dataset for text in videos.
arXiv Detail & Related papers (2020-05-19T14:51:25Z) - Text Synopsis Generation for Egocentric Videos [72.52130695707008]
We propose to generate a textual synopsis, consisting of a few sentences describing the most important events in a long egocentric videos.
Users can read the short text to gain insight about the video, and more importantly, efficiently search through the content of a large video database.
arXiv Detail & Related papers (2020-05-08T00:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.