A New Perspective for Flexible Feature Gathering in Scene Text
Recognition Via Character Anchor Pooling
- URL: http://arxiv.org/abs/2002.03509v1
- Date: Mon, 10 Feb 2020 03:01:23 GMT
- Title: A New Perspective for Flexible Feature Gathering in Scene Text
Recognition Via Character Anchor Pooling
- Authors: Shangbang Long, Yushuo Guan, Kaigui Bian, Cong Yao
- Abstract summary: We propose a pair of coupling modules, termed as Character Anchoring Module (CAM) and Anchor Pooling Module (APM)
CAM localizes the text in a shape-insensitive way by design by anchoring characters individually. APM then interpolates and gathers features flexibly along the character anchors which enables sequence learning.
- Score: 32.82620509088932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Irregular scene text recognition has attracted much attention from the
research community, mainly due to the complexity of shapes of text in natural
scene.
However, recent methods either rely on shape-sensitive modules such as
bounding box regression, or discard sequence learning.
To tackle these issues, we propose a pair of coupling modules, termed as
Character Anchoring Module (CAM) and Anchor Pooling Module (APM), to extract
high-level semantics from two-dimensional space to form feature sequences.
The proposed CAM localizes the text in a shape-insensitive way by design by
anchoring characters individually.
APM then interpolates and gathers features flexibly along the character
anchors which enables sequence learning.
The complementary modules realize a harmonic unification of spatial
information and sequence learning.
With the proposed modules, our recognition system surpasses previous
state-of-the-art scores on irregular and perspective text datasets, including,
ICDAR 2015, CUTE, and Total-Text, while paralleling state-of-the-art
performance on regular text datasets.
Related papers
- Leveraging Structure Knowledge and Deep Models for the Detection of Abnormal Handwritten Text [19.05500901000957]
We propose a two-stage detection algorithm that combines structure knowledge and deep models for handwritten text.
A shape regression network trained by a novel semi-supervised contrast training strategy is introduced and the positional relationship between the characters is fully employed.
Experiments on two handwritten text datasets show that the proposed method can greatly improve the detection performance.
arXiv Detail & Related papers (2024-10-15T14:57:10Z) - Towards Unified Multi-granularity Text Detection with Interactive Attention [56.79437272168507]
"Detect Any Text" is an advanced paradigm that unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model.
A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances.
Tests demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks.
arXiv Detail & Related papers (2024-05-30T07:25:23Z) - CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing [66.6712018832575]
Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains.
We make use of large-scale VLMs like CLIP and leverage the textual feature to dynamically adjust the classifier's weights for exploring generalizable visual features.
arXiv Detail & Related papers (2024-03-21T11:58:50Z) - Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition [56.968108142307976]
We propose a novel approach called Class-Aware Mask-guided feature refinement (CAM)
Our approach introduces canonical class-aware glyph masks to suppress background and text style noise.
By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion.
arXiv Detail & Related papers (2024-02-21T09:22:45Z) - Inverse-like Antagonistic Scene Text Spotting via Reading-Order
Estimation and Dynamic Sampling [26.420235903805782]
We propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS.
Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary.
We show that our method achieves superior performance both on irregular and inverse-like text spotting.
arXiv Detail & Related papers (2024-01-08T02:47:47Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z) - I3CL:Intra- and Inter-Instance Collaborative Learning for
Arbitrary-shaped Scene Text Detection [93.62705504233931]
We propose a novel method named Intra- and Inter-Instance Collaborative Learning (I3CL)
Specifically, to address the first issue, we design an effective convolutional module with multiple receptive fields.
To address the second issue, we devise an instance-based transformer module to exploit the dependencies between different text instances.
arXiv Detail & Related papers (2021-08-03T07:48:12Z) - SCATTER: Selective Context Attentional Scene Text Recognizer [16.311256552979835]
Scene Text Recognition (STR) is the task of recognizing text against complex image backgrounds.
Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes.
We introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER)
arXiv Detail & Related papers (2020-03-25T09:20:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.