TEXTRON: Weakly Supervised Multilingual Text Detection through Data
Programming
- URL: http://arxiv.org/abs/2402.09811v1
- Date: Thu, 15 Feb 2024 09:18:18 GMT
- Title: TEXTRON: Weakly Supervised Multilingual Text Detection through Data
Programming
- Authors: Dhruv Kudale, Badri Vishal Kasuba, Venkatapathy Subramanian, Parag
Chaudhuri, Ganesh Ramakrishnan
- Abstract summary: Text detection is a challenging problem in the field of computer vision (CV)
There is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts.
We propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework.
- Score: 21.88026116276415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several recent deep learning (DL) based techniques perform considerably well
on image-based multilingual text detection. However, their performance relies
heavily on the availability and quality of training data. There are numerous
types of page-level document images consisting of information in several
modalities, languages, fonts, and layouts. This makes text detection a
challenging problem in the field of computer vision (CV), especially for
low-resource or handwritten languages. Furthermore, there is a scarcity of
word-level labeled data for text detection, especially for multilingual
settings and Indian scripts that incorporate both printed and handwritten text.
Conventionally, Indian script text detection requires training a DL model on
plenty of labeled data, but to the best of our knowledge, no relevant datasets
are available. Manual annotation of such data requires a lot of time, effort,
and expertise. In order to solve this problem, we propose TEXTRON, a Data
Programming-based approach, where users can plug various text detection methods
into a weak supervision-based learning framework. One can view this approach to
multilingual text detection as an ensemble of different CV-based techniques and
DL approaches. TEXTRON can leverage the predictions of DL models pre-trained on
a significant amount of language data in conjunction with CV-based methods to
improve text detection in other languages. We demonstrate that TEXTRON can
improve the detection performance for documents written in Indian languages,
despite the absence of corresponding labeled data. Further, through extensive
experimentation, we show improvement brought about by our approach over the
current State-of-the-art (SOTA) models, especially for handwritten Devanagari
text. Code and dataset has been made available at
https://github.com/IITB-LEAP-OCR/TEXTRON
Related papers
- Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension.
We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images.
We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z) - The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language.
Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition.
We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z) - IndicSTR12: A Dataset for Indic Scene Text Recognition [33.194567434881314]
This paper proposes the largest and most comprehensive real dataset - IndicSTR12 - and benchmarking STR performance on 12 major Indian languages.
The size and complexity of the proposed dataset are comparable to those of existing Latin contemporaries.
The dataset contains over 27000 word-images gathered from various natural scenes, with over 1000 word-images for each language.
arXiv Detail & Related papers (2024-03-12T18:14:48Z) - MENTOR: Multilingual tExt detectioN TOward leaRning by analogy [59.37382045577384]
We propose a framework to detect and identify both seen and unseen language regions inside scene images.
"MENTOR" is the first work to realize a learning strategy between zero-shot learning and few-shot learning for multilingual scene text detection.
arXiv Detail & Related papers (2024-03-12T03:35:17Z) - Research on Multilingual Natural Scene Text Detection Algorithm [4.514028820667202]
We propose a multilingual text detection model to address the issues of low accuracy and high difficulty in detecting multilingual text in natural scenes.
We introduce the SFM Swin Transformer feature extraction network to enhance the model's robustness in detecting characters and fonts across different languages.
To overcome this, we propose a Global Semantic Branch, extracting and preserving global features for more effective text detection.
arXiv Detail & Related papers (2023-12-18T12:46:35Z) - AnyText: Multilingual Visual Text Generation And Editing [18.811943975513483]
We introduce AnyText, a diffusion-based multilingual visual text generation and editing model.
AnyText can write characters in multiple languages, to the best of our knowledge, this is the first work to address multilingual visual text generation.
We contribute the first large-scale multilingual text images dataset, AnyWord-3M, containing 3 million image-text pairs with OCR annotations in multiple languages.
arXiv Detail & Related papers (2023-11-06T12:10:43Z) - Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
Unsupervised Text Pretraining [65.30528567491984]
This paper proposes a method for zero-shot multilingual TTS using text-only data for the target language.
The use of text-only data allows the development of TTS systems for low-resource languages.
Evaluation results demonstrate highly intelligible zero-shot TTS with a character error rate of less than 12% for an unseen language.
arXiv Detail & Related papers (2023-01-30T00:53:50Z) - Language Agnostic Data-Driven Inverse Text Normalization [6.43601166279978]
inverse text normalization (ITN) problem attracts the attention of researchers from various fields.
Due to the scarcity of labeled spoken-written datasets, the studies on non-English data-driven ITN are quite limited.
We propose a language-agnostic data-driven ITN framework to fill this gap.
arXiv Detail & Related papers (2023-01-20T10:33:03Z) - XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages [11.581072296148031]
We conduct an extensive study using popular Transformer-based text generation models on our extended multi-lingual dataset.
Our experiments show that a multi-lingual mT5 model which uses fact-aware embeddings with structure-aware input encoding leads to best results on average across the twelve languages.
arXiv Detail & Related papers (2022-09-22T18:01:27Z) - Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection.
We propose to learn contextualized, joint representations through vision-language pre-training.
The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.