RoadText-1K: Text Detection & Recognition Dataset for Driving Videos
- URL: http://arxiv.org/abs/2005.09496v1
- Date: Tue, 19 May 2020 14:51:25 GMT
- Title: RoadText-1K: Text Detection & Recognition Dataset for Driving Videos
- Authors: Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marcal Rusinol,
Dimosthenis Karatzas. and C.V. Jawahar
- Abstract summary: This paper introduces a new "RoadText-1K" dataset for text in driving videos.
The dataset is 20 times larger than the existing largest dataset for text in videos.
- Score: 26.614671477004375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perceiving text is crucial to understand semantics of outdoor scenes and
hence is a critical requirement to build intelligent systems for driver
assistance and self-driving. Most of the existing datasets for text detection
and recognition comprise still images and are mostly compiled keeping text in
mind. This paper introduces a new "RoadText-1K" dataset for text in driving
videos. The dataset is 20 times larger than the existing largest dataset for
text in videos. Our dataset comprises 1000 video clips of driving without any
bias towards text and with annotations for text bounding boxes and
transcriptions in every frame. State of the art methods for text detection,
recognition and tracking are evaluated on the new dataset and the results
signify the challenges in unconstrained driving videos compared to existing
datasets. This suggests that RoadText-1K is suited for research and development
of reading systems, robust enough to be incorporated into more complex
downstream tasks like driver assistance and self-driving. The dataset can be
found at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtext-1k
Related papers
- Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension.
We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images.
We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z) - Composed Video Retrieval via Enriched Context and Discriminative Embeddings [118.66322242183249]
Composed video retrieval (CoVR) is a challenging problem in computer vision.
We introduce a novel CoVR framework that leverages detailed language descriptions to explicitly encode query-specific contextual information.
Our approach achieves gains as high as around 7% in terms of recall@K=1 score.
arXiv Detail & Related papers (2024-03-25T17:59:03Z) - DSText V2: A Comprehensive Video Text Spotting Dataset for Dense and
Small Text [46.177941541282756]
We establish a video text reading benchmark, named DSText V2, which focuses on Dense and Small text reading challenges in the video.
Compared with the previous datasets, the proposed dataset mainly include three new challenges.
High-proportioned small texts, coupled with the blurriness and distortion in the video, will bring further challenges.
arXiv Detail & Related papers (2023-11-29T09:13:27Z) - Reading Between the Lanes: Text VideoQA on the Road [27.923465943344723]
RoadTextVQA is a new dataset for the task of video question answering (VideoQA) in the context of driver assistance.
RoadTextVQA consists of $3,222$ driving videos collected from multiple countries, annotated with $10,500$ questions.
We assess the performance of state-of-the-art video question answering models on our RoadTextVQA dataset.
arXiv Detail & Related papers (2023-07-08T10:11:29Z) - A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension [49.74647080936875]
We introduce a large-scale and cross-modal Video Retrieval dataset with text reading comprehension, TextVR.
The proposed TextVR requires one unified cross-modal model to recognize and comprehend texts, relate them to the visual context, and decide what text semantic information is vital for the video retrieval task.
arXiv Detail & Related papers (2023-05-05T08:00:14Z) - ICDAR 2023 Video Text Reading Competition for Dense and Small Text [61.138557702185274]
We establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video.
Compared with the previous datasets, the proposed dataset mainly include three new challenges.
The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)
arXiv Detail & Related papers (2023-04-10T04:20:34Z) - Video text tracking for dense and small text based on pp-yoloe-r and
sort algorithm [0.9137554315375919]
DSText is 1080 * 1920 and slicing the video frame into several areas will destroy the spatial correlation of text.
For text detection, we adopt the PP-YOLOE-R which is proven effective in small object detection.
For text detection, we use the sort algorithm for high inference speed.
arXiv Detail & Related papers (2023-03-31T05:40:39Z) - Real-time End-to-End Video Text Spotter with Contrastive Representation
Learning [91.15406440999939]
We propose a real-time end-to-end video text spotter with Contrastive Representation learning (CoText)
CoText simultaneously address the three tasks (e.g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.
A simple, lightweight architecture is designed for effective and accurate performance.
arXiv Detail & Related papers (2022-07-18T07:54:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.