Related papers: Artistic-style text detector and a new Movie-Poster dataset

Artistic-style text detector and a new Movie-Poster dataset

URL: http://arxiv.org/abs/2406.16307v1
Date: Mon, 24 Jun 2024 04:10:28 GMT
Title: Artistic-style text detector and a new Movie-Poster dataset
Authors: Aoxiang Ning, Yiting Wei, Minglong Xue, Senming Zhong,
Abstract summary: This paper proposes a method that utilizes Criss-Cross Attention and residual dense block to address the incomplete and misdiagnosis of artistic-style text detection. Our proposed method performs superiorly on the Movie-Poster dataset and produces excellent results on multiple benchmark datasets.
Score: 1.6624384368855527
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although current text detection algorithms demonstrate effectiveness in general scenarios, their performance declines when confronted with artistic-style text featuring complex structures. This paper proposes a method that utilizes Criss-Cross Attention and residual dense block to address the incomplete and misdiagnosis of artistic-style text detection by current algorithms. Specifically, our method mainly consists of a feature extraction backbone, a feature enhancement network, a multi-scale feature fusion module, and a boundary discrimination module. The feature enhancement network significantly enhances the model's perceptual capabilities in complex environments by fusing horizontal and vertical contextual information, allowing it to capture detailed features overlooked in artistic-style text. We incorporate residual dense block into the Feature Pyramid Network to suppress the effect of background noise during feature fusion. Aiming to omit the complex post-processing, we explore a boundary discrimination module that guides the correct generation of boundary proposals. Furthermore, given that movie poster titles often use stylized art fonts, we collected a Movie-Poster dataset to address the scarcity of artistic-style text data. Extensive experiments demonstrate that our proposed method performs superiorly on the Movie-Poster dataset and produces excellent results on multiple benchmark datasets. The code and the Movie-Poster dataset will be available at: https://github.com/biedaxiaohua/Artistic-style-text-detection

Related papers

WAS: Dataset and Methods for Artistic Text Segmentation [57.61335995536524]
This paper focuses on the more challenging task of artistic text segmentation and constructs a real artistic text segmentation dataset. We propose a decoder with the layer-wise momentum query to prevent the model from ignoring stroke regions of special shapes. We also propose a skeleton-assisted head to guide the model to focus on the global structure.
arXiv Detail & Related papers (2024-07-31T18:29:36Z)
Seeing Text in the Dark: Algorithm and Benchmark [28.865779563872977]
In this work, we propose an efficient and effective single-stage approach for localizing text in dark. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. We present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages.
arXiv Detail & Related papers (2024-04-13T11:07:10Z)
Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions. We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z)
Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z)
Deformation Robust Text Spotting with Geometric Prior [5.639053898266709]
We develop a robust text spotting method (DR TextSpotter) to solve the recognition problem of complex deformation of characters in different fonts. A graph convolution network is constructed to fuse the character features and landmark features, and then performs semantic reasoning to enhance the discrimination for different characters.
arXiv Detail & Related papers (2023-08-31T02:13:15Z)
Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions [22.090074821554754]
We propose a self-supervised scene text segmentation algorithm with layered decoupling of representations derived from the object-centric manner to segment images into texts and background. On several public scene text datasets, our method outperforms the state-of-the-art unsupervised segmentation algorithms.
arXiv Detail & Related papers (2023-08-25T05:00:05Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map. We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z)
Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition [63.6608759501803]
We propose to recognize artistic text at three levels. corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape. Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification. Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points.
arXiv Detail & Related papers (2022-07-31T14:11:05Z)
Attention-based Feature Decomposition-Reconstruction Network for Scene Text Detection [20.85468268945721]
We propose attention-based feature decomposition-reconstruction network for scene text detection. We use contextual information and low-level feature to enhance the performance of segmentation-based text detector. Experiments have been conducted on two public benchmark datasets and results show that our proposed method achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-11-29T06:15:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.