ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text
Spotting
- URL: http://arxiv.org/abs/2105.03620v1
- Date: Sat, 8 May 2021 07:46:55 GMT
- Title: ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text
Spotting
- Authors: Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu
Liu, Hao Chen
- Abstract summary: End-to-end text-spotting aims to integrate detection and recognition in a unified framework.
Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2)
Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation.
Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the
- Score: 108.93803186429017
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: End-to-end text-spotting, which aims to integrate detection and recognition
in a unified framework, has attracted increasing attention due to its
simplicity of the two complimentary tasks. It remains an open problem
especially when processing arbitrarily-shaped text instances. Previous methods
can be roughly categorized into two groups: character-based and
segmentation-based, which often require character-level annotations and/or
complex post-processing due to the unstructured output. Here, we tackle
end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet
v2). Our main contributions are four-fold: 1) For the first time, we adaptively
fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared
with segmentation-based methods, can not only provide structured output but
also controllable representation. 2) We design a novel BezierAlign layer for
extracting accurate convolution features of a text instance of arbitrary
shapes, significantly improving the precision of recognition over previous
methods. 3) Different from previous methods, which often suffer from complex
post-processing and sensitive hyper-parameters, our ABCNet v2 maintains a
simple pipeline with the only post-processing non-maximum suppression (NMS). 4)
As the performance of text recognition closely depends on feature alignment,
ABCNet v2 further adopts a simple yet effective coordinate convolution to
encode the position of the convolutional filters, which leads to a considerable
improvement with negligible computation overhead. Comprehensive experiments
conducted on various bilingual (English and Chinese) benchmark datasets
demonstrate that ABCNet v2 can achieve state-of-the-art performance while
maintaining very high efficiency.
Related papers
- TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - SPTS v2: Single-Point Scene Text Spotting [146.98118405786445]
New framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation.
Tests show SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters.
Experiments suggest a potential preference for single-point representation in scene text spotting.
arXiv Detail & Related papers (2023-01-04T14:20:14Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network [108.07304516679103]
We propose the Adaptive Bezier-Curve Network (ABCNet) for scene text detection and recognition.
For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve.
Compared with standard bounding box detection, our Bezier curve detection introduces negligible overhead, resulting in superiority of our method in both efficiency and accuracy.
arXiv Detail & Related papers (2020-02-24T12:27:31Z) - Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting [49.768327669098674]
We propose an end-to-end trainable text spotting approach named Text Perceptron.
It first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information.
Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies.
arXiv Detail & Related papers (2020-02-17T08:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.