Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering
- URL: http://arxiv.org/abs/2405.12533v1
- Date: Tue, 21 May 2024 06:48:26 GMT
- Title: Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering
- Authors: Hiba Maryam, Ling Fu, Jiajun Song, Tajrian ABM Shafayet, Qidi Luo, Xiang Bai, Yuliang Liu,
- Abstract summary: This initiative seeks to bridge the gap between textual and visual comprehension.
We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images.
We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
- Score: 50.52792174648067
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The development of Urdu scene text detection, recognition, and Visual Question Answering (VQA) technologies is crucial for advancing accessibility, information retrieval, and linguistic diversity in digital content, facilitating better understanding and interaction with Urdu-language visual data. This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images, which can be used for text detection, recognition, and VQA tasks. We provide fine-grained annotations for text instances, addressing the limitations of previous datasets for facing arbitrary-shaped texts. By incorporating additional annotation points, this dataset facilitates the development and assessment of methods that can handle diverse text layouts, intricate shapes, and non-standard orientations commonly encountered in real-world scenarios. Besides, the VQA annotations make it the first benchmark for the Urdu Text VQA method, which can prompt the development of Urdu scene text understanding. The proposed dataset is available at: https://github.com/Hiba-MeiRuan/Urdu-VQA-Dataset-/tree/main
Related papers
- ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering [0.5803309695504829]
Main challenge of text-based VQA is exploiting the meaning and information from scene texts.
Recent studies tackled this challenge by considering the spatial information of scene texts in images.
We introduce a novel method that effectively exploits the information from scene texts written in Vietnamese.
arXiv Detail & Related papers (2024-10-18T03:00:03Z) - The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language.
Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition.
We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z) - ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images [1.2529442734851663]
We introduce the first large-scale dataset in Vietnamese specializing in the ability to understand text appearing in images.
We uncover the significance of the order in which tokens in OCR text are processed and selected to formulate answers.
arXiv Detail & Related papers (2024-04-16T15:28:30Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Leveraging machine learning for less developed languages: Progress on
Urdu text detection [0.76146285961466]
We present the use of machine learning methods to perform detection of Urdu text from the scene images.
To support research on Urdu text, We aim to make the data freely available for research use.
arXiv Detail & Related papers (2022-09-28T12:00:34Z) - TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation [55.83319599681002]
Text-VQA aims at answering questions that require understanding the textual cues in an image.
We develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.
arXiv Detail & Related papers (2022-08-03T02:18:09Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Urdu text in natural scene images: a new dataset and preliminary text
detection [3.070994681743188]
This work introduces a new dataset for Urdu text in natural scene images.
The dataset comprises of 500 standalone images acquired from real scenes.
MSER method is applied to extract Urdu text regions as candidates in an image.
arXiv Detail & Related papers (2021-09-16T15:41:50Z) - TextCaps: a Dataset for Image Captioning with Reading Comprehension [56.89608505010651]
Text is omnipresent in human environments and frequently critical to understand our surroundings.
To study how to comprehend text in the context of an image we collect a novel dataset, TextCaps, with 145k captions for 28k images.
Our dataset challenges a model to recognize text, relate it to its visual context, and decide what part of the text to copy or paraphrase.
arXiv Detail & Related papers (2020-03-24T02:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.