Related papers: A Large-scale Dataset for Robust Complex Anime Scene Text Detection

A Large-scale Dataset for Robust Complex Anime Scene Text Detection

URL: http://arxiv.org/abs/2510.07951v1
Date: Thu, 09 Oct 2025 08:47:52 GMT
Title: A Large-scale Dataset for Robust Complex Anime Scene Text Detection
Authors: Ziyi Dong, Yurui Zhang, Changmao Li, Naomi Rue Golding, Qing Long,
Abstract summary: Current text detection datasets primarily target natural or document scenes.<n>AnimeText is a large-scale dataset containing 735K images and 4.2M annotated text blocks.<n>It features hierarchical annotations and hard negative samples tailored for anime scenarios.
Score: 5.31665838601315
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Current text detection datasets primarily target natural or document scenes, where text typically appear in regular font and shapes, monotonous colors, and orderly layouts. The text usually arranged along straight or curved lines. However, these characteristics differ significantly from anime scenes, where text is often diverse in style, irregularly arranged, and easily confused with complex visual elements such as symbols and decorative patterns. Text in anime scene also includes a large number of handwritten and stylized fonts. Motivated by this gap, we introduce AnimeText, a large-scale dataset containing 735K images and 4.2M annotated text blocks. It features hierarchical annotations and hard negative samples tailored for anime scenarios. %Cross-dataset evaluations using state-of-the-art methods demonstrate that models trained on AnimeText achieve superior performance in anime text detection tasks compared to existing datasets. To evaluate the robustness of AnimeText in complex anime scenes, we conducted cross-dataset benchmarking using state-of-the-art text detection methods. Experimental results demonstrate that models trained on AnimeText outperform those trained on existing datasets in anime scene text detection tasks. AnimeText on HuggingFace: https://huggingface.co/datasets/deepghs/AnimeText

Related papers

KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark [1.5409800688911346]
We introduce the first Khmer scene-text dataset, featuring 1,544 expert-annotated images. This diverse dataset includes flat text, raised text, poorly illuminated text, distant polygon and partially obscured text.
arXiv Detail & Related papers (2024-10-23T21:04:24Z)
EAFormer: Scene Text Segmentation with Edge-Aware Transformers [56.15069996649572]
Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts. We propose Edge-Aware Transformers, EAFormer, to segment texts more accurately, especially at the edge of texts.
arXiv Detail & Related papers (2024-07-24T06:00:33Z)
Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images. We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z)
Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z)
Expressive Text-to-Image Generation with Rich Text [42.923053338525804]
We propose a rich-text editor supporting formats such as font style, size, color, and footnote.<n>We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis.
arXiv Detail & Related papers (2023-04-13T17:59:55Z)
Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis. The first hierarchical scene text dataset is introduced to enable this novel research task. We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z)
CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning [65.57338873921168]
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision. In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module. We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
arXiv Detail & Related papers (2021-12-14T16:22:25Z)
Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection. It is a general labeling method for texts with various shapes and requires low labeling costs. Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z)
Unconstrained Text Detection in Manga [3.04585143845864]
This work aims to identify text characters at a pixel level in a comic genre with highly sophisticated text styles: Japanese manga. Most of the literature in text detection use bounding box metrics, which are unsuitable for pixel-level evaluation. Using these resources, we designed and evaluated a deep network model, outperforming current methods for text detection in manga in most metrics.
arXiv Detail & Related papers (2020-10-07T13:28:13Z)
Unconstrained Text Detection in Manga: a New Dataset and Baseline [3.04585143845864]
This work aims to binarize text in a comic genre with highly sophisticated text styles: Japanese manga. To overcome the lack of a manga dataset with text annotations at a pixel level, we create our own. Using these resources, we designed and evaluated a deep network model, outperforming current methods for text binarization in manga in most metrics.
arXiv Detail & Related papers (2020-09-09T00:16:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.