PBFormer: Capturing Complex Scene Text Shape with Polynomial Band
Transformer
- URL: http://arxiv.org/abs/2308.15004v1
- Date: Tue, 29 Aug 2023 03:41:27 GMT
- Title: PBFormer: Capturing Complex Scene Text Shape with Polynomial Band
Transformer
- Authors: Ruijin Liu, Ning Lu, Dapeng Chen, Cheng Li, Zejian Yuan, Wei Peng
- Abstract summary: We present PBFormer, an efficient yet powerful scene text detector.
It unifies a transformer with a novel text shape shape Band (PB)
The simple operation can help detect small-scale texts.
- Score: 28.52028534365144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present PBFormer, an efficient yet powerful scene text detector that
unifies the transformer with a novel text shape representation Polynomial Band
(PB). The representation has four polynomial curves to fit a text's top,
bottom, left, and right sides, which can capture a text with a complex shape by
varying polynomial coefficients. PB has appealing features compared with
conventional representations: 1) It can model different curvatures with a fixed
number of parameters, while polygon-points-based methods need to utilize a
different number of points. 2) It can distinguish adjacent or overlapping texts
as they have apparent different curve coefficients, while segmentation-based or
points-based methods suffer from adhesive spatial positions. PBFormer combines
the PB with the transformer, which can directly generate smooth text contours
sampled from predicted curves without interpolation. A parameter-free
cross-scale pixel attention (CPA) module is employed to highlight the feature
map of a suitable scale while suppressing the other feature maps. The simple
operation can help detect small-scale texts and is compatible with the
one-stage DETR framework, where no postprocessing exists for NMS. Furthermore,
PBFormer is trained with a shape-contained loss, which not only enforces the
piecewise alignment between the ground truth and the predicted curves but also
makes curves' positions and shapes consistent with each other. Without bells
and whistles about text pre-training, our method is superior to the previous
state-of-the-art text detectors on the arbitrary-shaped text datasets.
Related papers
- Differentiable Registration of Images and LiDAR Point Clouds with
VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training.
Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks.
We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - ISS: Image as Stetting Stone for Text-Guided 3D Shape Generation [91.37036638939622]
This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities.
Our key contribution is a two-stage feature-space-alignment approach that maps CLIP features to shapes.
We formulate a text-guided shape stylization module to dress up the output shapes with novel textures.
arXiv Detail & Related papers (2022-09-09T06:54:21Z) - DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in
Transformer [94.35116535588332]
Transformer-based methods, which predict polygon points or Bezier curve control points to localize texts, are quite popular in scene text detection.
However, the used point label form implies the reading order of humans, which affects the robustness of Transformer model.
We propose DPText-DETR, which directly uses point coordinates as queries and dynamically updates them between decoder layers.
arXiv Detail & Related papers (2022-07-10T15:45:16Z) - Text Spotting Transformers [29.970268691631333]
TESTR builds upon a single encoder and dual decoders for the joint text-box control point regression and character recognition.
We show our canonical representation of control points suitable for text instances in both Bezier curve and annotations.
In addition, we design a bounding-box guided detection (box-to-polygon) process.
arXiv Detail & Related papers (2022-04-05T01:05:31Z) - Arbitrary Shape Text Detection using Transformers [2.294014185517203]
We propose an end-to-end trainable architecture for arbitrary-shaped text detection using Transformers (DETR)
At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio.
We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text.
arXiv Detail & Related papers (2022-02-22T22:36:29Z) - Fourier Contour Embedding for Arbitrary-Shaped Text Detection [47.737805731529455]
We propose a novel method to represent arbitrary shaped text contours as compact signatures.
We show that FCE is accurate and robust to fit contours of scene texts even with highly-curved shapes.
Our FCENet is superior to the state-of-the-art (SOTA) methods on CTW1500 and Total-Text.
arXiv Detail & Related papers (2021-04-21T10:21:57Z) - BOTD: Bold Outline Text Detector [85.33700624095181]
We propose a new one-stage text detector, termed as Bold Outline Text Detector (BOTD)
BOTD is able to process the arbitrary-shaped text with low model complexity.
Experimental results on three real-world benchmarks show the state-of-the-art performance of BOTD.
arXiv Detail & Related papers (2020-11-30T11:54:14Z) - TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene
Text Detection [20.34326396800748]
We propose an arbitrary-shaped text detection method, namely TextRay, which conducts top-down contour-based geometric modeling and geometric parameter learning.
Experiments on several benchmark datasets demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-08-11T16:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.