A Comprehensive Gold Standard and Benchmark for Comics Text Detection
and Recognition
- URL: http://arxiv.org/abs/2212.14674v1
- Date: Tue, 27 Dec 2022 12:05:23 GMT
- Title: A Comprehensive Gold Standard and Benchmark for Comics Text Detection
and Recognition
- Authors: G\"urkan Soykan, Deniz Yuret, Tevfik Metin Sezgin
- Abstract summary: This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset.
We created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition"
We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS.
- Score: 2.1485350418225244
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study focuses on improving the optical character recognition (OCR) data
for panels in the COMICS dataset, the largest dataset containing text and
images from comic books. To do this, we developed a pipeline for OCR processing
and labeling of comic books and created the first text detection and
recognition datasets for western comics, called "COMICS Text+: Detection" and
"COMICS Text+: Recognition". We evaluated the performance of state-of-the-art
text detection and recognition models on these datasets and found significant
improvement in word accuracy and normalized edit distance compared to the text
in COMICS. We also created a new dataset called "COMICS Text+", which contains
the extracted text from the textboxes in the COMICS dataset. Using the improved
text data of COMICS Text+ in the comics processing model from resulted in
state-of-the-art performance on cloze-style tasks without changing the model
architecture. The COMICS Text+ dataset can be a valuable resource for
researchers working on tasks including text detection, recognition, and
high-level processing of comics, such as narrative understanding, character
relations, and story generation. All the data and inference instructions can be
accessed in https://github.com/gsoykan/comics_text_plus.
Related papers
- SegHist: A General Segmentation-based Framework for Chinese Historical Document Text Line Detection [10.08588082910962]
Text line detection is a key task in historical document analysis.
We propose a general framework for historical document text detection (SegHist)
Integrating the SegHist framework with the commonly used method DB++, we develop DB-SegHist.
arXiv Detail & Related papers (2024-06-17T11:00:04Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds.
We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs.
We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z) - Contextual Text Block Detection towards Scene Text Understanding [85.40898487745272]
This paper presents contextual text detection, a new setup that detects contextual text blocks (CTBs) for better understanding of texts in scenes.
We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.
To this end, we design a novel scene text clustering technique that treats integral text units as tokens and groups them (belonging to the same CTB) into an ordered token sequence.
arXiv Detail & Related papers (2022-07-26T14:59:25Z) - SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
Detection and Text Recognition [73.61592015908353]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter.
Using a transformer with dynamic head as the detector, we unify the two tasks with a novel Recognition Conversion mechanism.
The design results in a concise framework that requires neither additional rectification module nor character-level annotation.
arXiv Detail & Related papers (2022-03-19T01:14:42Z) - On Exploring and Improving Robustness of Scene Text Detection Models [20.15225372544634]
We evaluate scene text detection models ICDAR2015-C (IC15-C) and CTW1500-C (CTW-C)
We perform a robustness analysis of six key components: pre-training data, backbone, feature fusion module, multi-scale predictions, representation of text instances and loss function.
We present a simple yet effective data-based method to destroy the smoothness of text regions by merging background and foreground.
arXiv Detail & Related papers (2021-10-12T02:36:48Z) - TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped
scene text [23.04601165885908]
We propose TextOCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images.
We show that current state-of-the-art text-recognition (OCR) models fail to perform well on TextOCR.
We use a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion.
arXiv Detail & Related papers (2021-05-12T07:50:42Z) - TAP: Text-Aware Pre-training for Text-VQA and Text-Caption [75.44716665758415]
We propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks.
TAP explicitly incorporates scene text (generated from OCR engines) in pre-training.
Our approach outperforms the state of the art by large margins on multiple tasks.
arXiv Detail & Related papers (2020-12-08T18:55:21Z) - Rethinking Text Segmentation: A Novel Dataset and A Text-Specific
Refinement Approach [34.63444886780274]
Text segmentation is a prerequisite in real-world text-related tasks.
We introduce Text Refinement Network (TexRNet), a novel text segmentation approach.
TexRNet consistently improves text segmentation performance by nearly 2% compared to other state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-11-27T22:50:09Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.