Separate Scene Text Detector for Unseen Scripts is Not All You Need
- URL: http://arxiv.org/abs/2307.15991v1
- Date: Sat, 29 Jul 2023 14:03:05 GMT
- Title: Separate Scene Text Detector for Unseen Scripts is Not All You Need
- Authors: Prateek Keserwani, Taveena Lotey, Rohit Keshari, and Partha Pratim Roy
- Abstract summary: In the last decade, some scripts have gained the attention of the research community and achieved good detection performance.
Many scripts are low-resourced for training deep learning-based scene text detectors.
It raises a critical question: Is there a need for separate training for new scripts?
This paper acknowledges this problem and proposes a solution to detect scripts not present during training.
- Score: 12.848024214330234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text detection in the wild is a well-known problem that becomes more
challenging while handling multiple scripts. In the last decade, some scripts
have gained the attention of the research community and achieved good detection
performance. However, many scripts are low-resourced for training deep
learning-based scene text detectors. It raises a critical question: Is there a
need for separate training for new scripts? It is an unexplored query in the
field of scene text detection. This paper acknowledges this problem and
proposes a solution to detect scripts not present during training. In this
work, the analysis has been performed to understand cross-script text
detection, i.e., trained on one and tested on another. We found that the
identical nature of text annotation (word-level/line-level) is crucial for
better cross-script text detection. The different nature of text annotation
between scripts degrades cross-script text detection performance. Additionally,
for unseen script detection, the proposed solution utilizes vector embedding to
map the stroke information of text corresponding to the script category. The
proposed method is validated with a well-known multi-lingual scene text dataset
under a zero-shot setting. The results show the potential of the proposed
method for unseen script detection in natural images.
Related papers
- TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model [17.77384627944455]
Existing scene text spotters are designed to locate and transcribe texts from images.
Our proposed scene text spotter leverages advanced PLMs to enhance performance without fine-grained detection.
Benefiting from the comprehensive language knowledge gained during the pre-training phase, the PLM-based recognition module effectively handles complex scenarios.
arXiv Detail & Related papers (2024-03-15T06:38:25Z) - DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting [112.45423990924283]
DeepSolo++ is a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.
Our method not only performs well in English scenes but also masters the transcription with complex font structure and a thousand-level character classes, such as Chinese.
arXiv Detail & Related papers (2023-05-31T15:44:00Z) - Cursive Caption Text Detection in Videos [5.117030416610515]
This paper presents a robust technique for detection of textual content appearing in video frames.
We target text in cursive script taking Urdu text as a case study.
Since it is common to have videos with caption text in multiple-scripts, cursive text is distinguished from Latin text using a script-identification module.
arXiv Detail & Related papers (2023-01-09T04:30:48Z) - Contextual Text Block Detection towards Scene Text Understanding [85.40898487745272]
This paper presents contextual text detection, a new setup that detects contextual text blocks (CTBs) for better understanding of texts in scenes.
We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.
To this end, we design a novel scene text clustering technique that treats integral text units as tokens and groups them (belonging to the same CTB) into an ordered token sequence.
arXiv Detail & Related papers (2022-07-26T14:59:25Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Scene Text Retrieval via Joint Text Detection and Similarity Learning [68.24531728554892]
Scene text retrieval aims to localize and search all text instances from an image gallery, which are the same or similar to a given query text.
We address this problem by directly learning a cross-modal similarity between a query text and each text instance from natural images.
In this way, scene text retrieval can be simply performed by ranking the detected text instances with the learned similarity.
arXiv Detail & Related papers (2021-04-04T07:18:38Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - AE TextSpotter: Learning Visual and Linguistic Representation for
Ambiguous Text Spotting [98.08853679310603]
This work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter)
AE TextSpotter learns both visual and linguistic features to significantly reduce ambiguity in text detection.
To our knowledge, it is the first time to improve text detection by using a language model.
arXiv Detail & Related papers (2020-08-03T08:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.