Related papers: ICDAR 2023 Video Text Reading Competition for Dense and Small Text

ICDAR 2023 Video Text Reading Competition for Dense and Small Text

URL: http://arxiv.org/abs/2304.04376v1
Date: Mon, 10 Apr 2023 04:20:34 GMT
Title: ICDAR 2023 Video Text Reading Competition for Dense and Small Text
Authors: Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai
Abstract summary: We establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video. Compared with the previous datasets, the proposed dataset mainly include three new challenges. The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)
Score: 61.138557702185274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenarios, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, a new challenge for video text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g., Game, sports, etc. The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)). During the competition period (opened on 15th February 2023 and closed on 20th March 2023), a total of 24 teams participated in the three proposed tasks with around 30 valid submissions, respectively. In this article, we describe detailed statistical information of the dataset, tasks, evaluation protocols and the results summaries of the ICDAR 2023 on DSText competition. Moreover, we hope the benchmark will promise video text research in the community.

Related papers

Scene-Text Grounding for Text-Based Video Question Answering [97.1112579979614]
Existing efforts in text-based video question answering (TextVideoQA) are criticized for their opaque decisionmaking and reliance on scene-text recognition. We study Grounded TextVideoQA by forcing models to answer questions and interpret relevant scene-text regions.
arXiv Detail & Related papers (2024-09-22T05:13:11Z)
Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images. We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z)
DSText V2: A Comprehensive Video Text Spotting Dataset for Dense and Small Text [46.177941541282756]
We establish a video text reading benchmark, named DSText V2, which focuses on Dense and Small text reading challenges in the video. Compared with the previous datasets, the proposed dataset mainly include three new challenges. High-proportioned small texts, coupled with the blurriness and distortion in the video, will bring further challenges.
arXiv Detail & Related papers (2023-11-29T09:13:27Z)
A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension [49.74647080936875]
We introduce a large-scale and cross-modal Video Retrieval dataset with text reading comprehension, TextVR. The proposed TextVR requires one unified cross-modal model to recognize and comprehend texts, relate them to the visual context, and decide what text semantic information is vital for the video retrieval task.
arXiv Detail & Related papers (2023-05-05T08:00:14Z)
Real-time End-to-End Video Text Spotter with Contrastive Representation Learning [91.15406440999939]
We propose a real-time end-to-end video text spotter with Contrastive Representation learning (CoText) CoText simultaneously address the three tasks (e.g., text detection, tracking, recognition) in a real-time end-to-end trainable framework. A simple, lightweight architecture is designed for effective and accurate performance.
arXiv Detail & Related papers (2022-07-18T07:54:17Z)
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer [12.167938646139705]
We introduce a large-scale, Bilingual, Open World Video text benchmark dataset(BOVText) Firstly, we provide 2,000+ videos with more than 1,750,000+ frames, 25 times larger than the existing largest dataset with incidental text in videos. Secondly, our dataset covers 30+ open categories with a wide selection of various scenarios, e.g., Life Vlog, Driving, Movie, etc.
arXiv Detail & Related papers (2021-12-09T13:21:26Z)
ICDAR 2021 Competition on Scene Video Text Spotting [28.439390836950025]
Scene video text spotting (SVTS) is a very important research topic because of many real-life applications. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2021 on SVTS competition.
arXiv Detail & Related papers (2021-07-26T01:25:57Z)
RoadText-1K: Text Detection & Recognition Dataset for Driving Videos [26.614671477004375]
This paper introduces a new "RoadText-1K" dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos.
arXiv Detail & Related papers (2020-05-19T14:51:25Z)
Text Synopsis Generation for Egocentric Videos [72.52130695707008]
We propose to generate a textual synopsis, consisting of a few sentences describing the most important events in a long egocentric videos. Users can read the short text to gain insight about the video, and more importantly, efficiently search through the content of a large video database.
arXiv Detail & Related papers (2020-05-08T00:28:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.