Boosting Modern and Historical Handwritten Text Recognition with
Deformable Convolutions
- URL: http://arxiv.org/abs/2208.08109v1
- Date: Wed, 17 Aug 2022 06:55:54 GMT
- Title: Boosting Modern and Historical Handwritten Text Recognition with
Deformable Convolutions
- Authors: Silvia Cascianelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
- Abstract summary: Handwritten Text Recognition (HTR) in free-volution pages is a challenging image understanding task.
We propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text.
- Score: 52.250269529057014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handwritten Text Recognition (HTR) in free-layout pages is a challenging
image understanding task that can provide a relevant boost to the digitization
of handwritten documents and reuse of their content. The task becomes even more
challenging when dealing with historical documents due to the variability of
the writing style and degradation of the page quality. State-of-the-art HTR
approaches typically couple recurrent structures for sequence modeling with
Convolutional Neural Networks for visual feature extraction. Since
convolutional kernels are defined on fixed grids and focus on all input pixels
independently while moving over the input image, this strategy disregards the
fact that handwritten characters can vary in shape, scale, and orientation even
within the same document and that the ink pixels are more relevant than the
background ones. To cope with these specific HTR difficulties, we propose to
adopt deformable convolutions, which can deform depending on the input at hand
and better adapt to the geometric variations of the text. We design two
deformable architectures and conduct extensive experiments on both modern and
historical datasets. Experimental results confirm the suitability of deformable
convolutions for the HTR task.
Related papers
- DiffusionPen: Towards Controlling the Style of Handwritten Text Generation [7.398476020996681]
DiffusionPen (DiffPen) is a 5-shot style handwritten text generation approach based on Latent Diffusion Models.
Our approach captures both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples.
Our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.
arXiv Detail & Related papers (2024-09-09T20:58:25Z) - Representing Online Handwriting for Recognition in Large Vision-Language
Models [8.344510330567495]
We propose a novel tokenized representation of digital ink (online handwriting) that includes both a time-ordered sequence of strokes as text, and as image.
We show that this representation yields results comparable to or better than state-of-the-art online handwriting recognizers.
arXiv Detail & Related papers (2024-02-23T13:11:10Z) - Story Visualization by Online Text Augmentation with Context Memory [64.86944645907771]
We propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation.
The proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision.
arXiv Detail & Related papers (2023-08-15T05:08:12Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - Content and Style Aware Generation of Text-line Images for Handwriting
Recognition [4.301658883577544]
We propose a generative method for handwritten text-line images conditioned on both visual appearance and textual content.
Our method is able to produce long text-line samples with diverse handwriting styles.
arXiv Detail & Related papers (2022-04-12T05:52:03Z) - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [58.71128866226768]
Recent text-to-image generation methods have incrementally improved the generated image fidelity and text relevancy.
We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene.
Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels.
arXiv Detail & Related papers (2022-03-24T15:44:50Z) - Continuous Offline Handwriting Recognition using Deep Learning Models [0.0]
Handwritten text recognition is an open problem of great interest in the area of automatic document image analysis.
We have proposed a new recognition model based on integrating two types of deep learning architectures: convolutional neural networks (CNN) and sequence-to-sequence (seq2seq)
The new proposed model provides competitive results with those obtained with other well-established methodologies.
arXiv Detail & Related papers (2021-12-26T07:31:03Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - Full Page Handwriting Recognition via Image to Sequence Extraction [0.0]
The model achieves a new state-of-art in full page recognition on the IAM dataset.
It is deployed in production as part of a commercial web application.
arXiv Detail & Related papers (2021-03-11T04:37:29Z) - SPIN: Structure-Preserving Inner Offset Network for Scene Text
Recognition [48.676064155070556]
Arbitrary text appearance poses a great challenge in scene text recognition tasks.
We introduce a new learnable geometric-unrelated module, the Structure-Preserving Inner Offset Network (SPIN)
SPIN allows the color manipulation of source data within the network.
arXiv Detail & Related papers (2020-05-27T01:47:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.