Unconstrained Text Detection in Manga: a New Dataset and Baseline
- URL: http://arxiv.org/abs/2009.04042v1
- Date: Wed, 9 Sep 2020 00:16:51 GMT
- Title: Unconstrained Text Detection in Manga: a New Dataset and Baseline
- Authors: Juli\'an Del Gobbo, Rosana Matuk Herrera
- Abstract summary: This work aims to binarize text in a comic genre with highly sophisticated text styles: Japanese manga.
To overcome the lack of a manga dataset with text annotations at a pixel level, we create our own.
Using these resources, we designed and evaluated a deep network model, outperforming current methods for text binarization in manga in most metrics.
- Score: 3.04585143845864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The detection and recognition of unconstrained text is an open problem in
research. Text in comic books has unusual styles that raise many challenges for
text detection. This work aims to binarize text in a comic genre with highly
sophisticated text styles: Japanese manga. To overcome the lack of a manga
dataset with text annotations at a pixel level, we create our own. To improve
the evaluation and search of an optimal model, in addition to standard metrics
in binarization, we implement other special metrics. Using these resources, we
designed and evaluated a deep network model, outperforming current methods for
text binarization in manga in most metrics.
Related papers
- KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark [1.5409800688911346]
We introduce the first Khmer scene-text dataset, featuring 1,544 expert-annotated images.
This diverse dataset includes flat text, raised text, poorly illuminated text, distant polygon and partially obscured text.
arXiv Detail & Related papers (2024-10-23T21:04:24Z) - TextMastero: Mastering High-Quality Scene Text Editing in Diverse Languages and Styles [12.182588762414058]
Scene text editing aims to modify texts on images while maintaining the style of newly generated text similar to the original.
Recent works leverage diffusion models, showing improved results, yet still face challenges.
We present emphTextMastero - a carefully designed multilingual scene text editing architecture based on latent diffusion models (LDMs)
arXiv Detail & Related papers (2024-08-20T08:06:09Z) - The Manga Whisperer: Automatically Generating Transcriptions for Comics [55.544015596503726]
We present a unified model, Magi, that is able to detect panels, text boxes and character boxes.
We propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript.
arXiv Detail & Related papers (2024-01-18T18:59:09Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - TextDiffuser-2: Unleashing the Power of Language Models for Text
Rendering [118.30923824681642]
TextDiffuser-2 aims to unleash the power of language models for text rendering.
We utilize the language model within the diffusion model to encode the position and texts at the line level.
We conduct extensive experiments and incorporate user studies involving human participants as well as GPT-4V.
arXiv Detail & Related papers (2023-11-28T04:02:40Z) - Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection [37.083051419659135]
Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs.
Unlike existing methods mainly based on distances, we propose a deep learning-based method using scene graph generation models.
Experimental results demonstrate that our scene-graph-based approach outperforms existing methods, achieving a prediction accuracy of over 75%.
arXiv Detail & Related papers (2023-06-30T08:34:08Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Stylized Data-to-Text Generation: A Case Study in the E-Commerce Domain [53.22419717434372]
We propose a new task, namely stylized data-to-text generation, whose aim is to generate coherent text according to a specific style.
This task is non-trivial, due to three challenges: the logic of the generated text, unstructured style reference, and biased training samples.
We propose a novel stylized data-to-text generation model, named StyleD2T, comprising three components: logic planning-enhanced data embedding, mask-based style embedding, and unbiased stylized text generation.
arXiv Detail & Related papers (2023-05-05T03:02:41Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Detection of Furigana Text in Images [1.77898701462905]
Furigana are pronunciation notes used in Japanese writing.
Being able to detect these can help improve optical character recognition (OCR) performance.
This project focuses on detecting furigana in Japanese books and comics.
arXiv Detail & Related papers (2022-07-08T15:27:19Z) - Unconstrained Text Detection in Manga [3.04585143845864]
This work aims to identify text characters at a pixel level in a comic genre with highly sophisticated text styles: Japanese manga.
Most of the literature in text detection use bounding box metrics, which are unsuitable for pixel-level evaluation.
Using these resources, we designed and evaluated a deep network model, outperforming current methods for text detection in manga in most metrics.
arXiv Detail & Related papers (2020-10-07T13:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.