The Learnable Typewriter: A Generative Approach to Text Analysis
- URL: http://arxiv.org/abs/2302.01660v3
- Date: Fri, 14 Apr 2023 14:08:29 GMT
- Title: The Learnable Typewriter: A Generative Approach to Text Analysis
- Authors: Ioannis Siglidis, Nicolas Gonthier, Julien Gaubil, Tom Monnier and
Mathieu Aubry
- Abstract summary: We present a generative document-specific approach to character analysis and recognition in text lines.
Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters.
- Score: 17.355857281085164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a generative document-specific approach to character analysis and
recognition in text lines. Our main idea is to build on unsupervised
multi-object segmentation methods and in particular those that reconstruct
images based on a limited amount of visual elements, called sprites. Taking as
input a set of text lines with similar font or handwriting, our approach can
learn a large number of different characters and leverage line-level
annotations when available. Our contribution is twofold. First, we provide the
first adaptation and evaluation of a deep unsupervised multi-object
segmentation approach for text line analysis. Since these methods have mainly
been evaluated on synthetic data in a completely unsupervised setting,
demonstrating that they can be adapted and quantitatively evaluated on real
images of text and that they can be trained using weak supervision are
significant progresses. Second, we show the potential of our method for new
applications, more specifically in the field of paleography, which studies the
history and variations of handwriting, and for cipher analysis. We demonstrate
our approach on three very different datasets: a printed volume of the
Google1000 dataset, the Copiale cipher and historical handwritten charters from
the 12th and early 13th century.
Related papers
- General Detection-based Text Line Recognition [15.761142324480165]
We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten (HTR)
Our approach builds on a completely different paradigm than state-of-the-art HTR methods, which rely on autoregressive decoding.
We improve state-of-the-art performances for Chinese script recognition on the CASIA v2 dataset, and for cipher recognition on the Borg and Copiale datasets.
arXiv Detail & Related papers (2024-09-25T17:05:55Z) - Advancing Visual Grounding with Scene Knowledge: Benchmark and Method [74.72663425217522]
Visual grounding (VG) aims to establish fine-grained alignment between vision and language.
Most existing VG datasets are constructed using simple description texts.
We propose a novel benchmark of underlineScene underlineKnowledge-guided underlineVisual underlineGrounding.
arXiv Detail & Related papers (2023-07-21T13:06:02Z) - Holistic Visual-Textual Sentiment Analysis with Prior Models [64.48229009396186]
We propose a holistic method that achieves robust visual-textual sentiment analysis.
The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract selected semantic visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on BERT to fuse multimodal features and make sentiment predictions.
arXiv Detail & Related papers (2022-11-23T14:40:51Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection.
We propose to learn contextualized, joint representations through vision-language pre-training.
The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z) - Robust Text Line Detection in Historical Documents: Learning and
Evaluation Methods [1.9938405188113029]
We present a study conducted using three state-of-the-art systems Doc-UFCN, dhSegment and ARU-Net.
We show that it is possible to build generic models trained on a wide variety of historical document datasets that can correctly segment diverse unseen pages.
arXiv Detail & Related papers (2022-03-23T11:56:25Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - Whole page recognition of historical handwriting [1.2183405753834562]
We investigate an end-to-end inference approach without text localization which takes a handwritten page and transcribes its full text.
No explicit character, word or line segmentation is involved in inference which is why we call this approach "segmentation free"
We conclude that a whole page inference approach without text localization and segmentation is competitive.
arXiv Detail & Related papers (2020-09-22T15:46:33Z) - Combining Visual and Textual Features for Semantic Segmentation of
Historical Newspapers [2.5899040911480187]
We introduce a multimodal approach for the semantic segmentation of historical newspapers.
Based on experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features.
Results show consistent improvement of multimodal models in comparison to a strong visual baseline.
arXiv Detail & Related papers (2020-02-14T17:56:18Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.