EAFormer: Scene Text Segmentation with Edge-Aware Transformers
- URL: http://arxiv.org/abs/2407.17020v1
- Date: Wed, 24 Jul 2024 06:00:33 GMT
- Title: EAFormer: Scene Text Segmentation with Edge-Aware Transformers
- Authors: Haiyang Yu, Teng Fu, Bin Li, Xiangyang Xue,
- Abstract summary: Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts.
We propose Edge-Aware Transformers, EAFormer, to segment texts more accurately, especially at the edge of texts.
- Score: 56.15069996649572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts. The existing text segmentation methods tend to involve various text-related supervisions for better performance. However, most of them ignore the importance of text edges, which are significant for downstream applications. In this paper, we propose Edge-Aware Transformers, termed EAFormer, to segment texts more accurately, especially at the edge of texts. Specifically, we first design a text edge extractor to detect edges and filter out edges of non-text areas. Then, we propose an edge-guided encoder to make the model focus more on text edges. Finally, an MLP-based decoder is employed to predict text masks. We have conducted extensive experiments on commonly-used benchmarks to verify the effectiveness of EAFormer. The experimental results demonstrate that the proposed method can perform better than previous methods, especially on the segmentation of text edges. Considering that the annotations of several benchmarks (e.g., COCO_TS and MLT_S) are not accurate enough to fairly evaluate our methods, we have relabeled these datasets. Through experiments, we observe that our method can achieve a higher performance improvement when more accurate annotations are used for training.
Related papers
- Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis [52.34110239735265]
We present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis.
Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance.
arXiv Detail & Related papers (2024-05-13T05:48:35Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Weakly-Supervised Text Instance Segmentation [44.20745377169349]
We take the first attempt to perform weakly-supervised text instance segmentation by bridging text recognition and text segmentation.
The proposed method significantly outperforms weakly-supervised instance segmentation methods on ICDAR13-FST (18.95$%$ improvement) and TextSeg (17.80$%$ improvement) benchmarks.
arXiv Detail & Related papers (2023-03-20T03:56:47Z) - The Surprisingly Straightforward Scene Text Removal Method With Gated
Attention and Region of Interest Generation: A Comprehensive Prominent Model
Analysis [0.76146285961466]
Scene text removal (STR) is a task of erasing text from natural scene images.
We introduce a simple yet extremely effective Gated Attention (GA) and Region-of-Interest Generation (RoIG) methodology in this paper.
Experimental results on the benchmark dataset show that our method significantly outperforms existing state-of-the-art methods in almost all metrics.
arXiv Detail & Related papers (2022-10-14T03:34:21Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - Rethinking Text Segmentation: A Novel Dataset and A Text-Specific
Refinement Approach [34.63444886780274]
Text segmentation is a prerequisite in real-world text-related tasks.
We introduce Text Refinement Network (TexRNet), a novel text segmentation approach.
TexRNet consistently improves text segmentation performance by nearly 2% compared to other state-of-the-art segmentation methods.
arXiv Detail & Related papers (2020-11-27T22:50:09Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.