Adaptive Text Recognition through Visual Matching
- URL: http://arxiv.org/abs/2009.06610v1
- Date: Mon, 14 Sep 2020 17:48:53 GMT
- Title: Adaptive Text Recognition through Visual Matching
- Authors: Chuhan Zhang, Ankush Gupta, Andrew Zisserman
- Abstract summary: We introduce a new model that exploits the repetitive nature of characters in languages.
By doing this, we turn text recognition into a shape matching problem.
We show that it can handle challenges that traditional architectures are not able to solve without expensive retraining.
- Score: 86.40870804449737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, our objective is to address the problems of generalization and
flexibility for text recognition in documents. We introduce a new model that
exploits the repetitive nature of characters in languages, and decouples the
visual representation learning and linguistic modelling stages. By doing this,
we turn text recognition into a shape matching problem, and thereby achieve
generalization in appearance and flexibility in classes. We evaluate the new
model on both synthetic and real datasets across different alphabets and show
that it can handle challenges that traditional architectures are not able to
solve without expensive retraining, including: (i) it can generalize to unseen
fonts without new exemplars from them; (ii) it can flexibly change the number
of classes, simply by changing the exemplars provided; and (iii) it can
generalize to new languages and new characters that it has not been trained for
by providing a new glyph set. We show significant improvements over
state-of-the-art models for all these cases.
Related papers
- One-Shot Multilingual Font Generation Via ViT [2.023301270280465]
Font design poses unique challenges for logographic languages like Chinese, Japanese, and Korean.
This paper introduces a novel Vision Transformer (ViT)-based model for multi-language font generation.
arXiv Detail & Related papers (2024-12-15T23:52:35Z) - Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation [70.95783968368124]
We introduce a novel multi-modal autoregressive model, dubbed $textbfInstaManip$.
We propose an innovative group self-attention mechanism to break down the in-context learning process into two separate stages.
Our method surpasses previous few-shot image manipulation models by a notable margin.
arXiv Detail & Related papers (2024-12-02T01:19:21Z) - Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings [5.257719744958367]
This thesis explores three challenging settings in text classification by leveraging the intrinsic knowledge of pretrained language models (PLMs)
We develop models that utilize features based on contextualized word representations from PLMs, achieving performance that rivals or surpasses human accuracy.
Lastly, we tackle the sensitivity of large language models to in-context learning prompts by selecting effective demonstrations.
arXiv Detail & Related papers (2024-08-28T09:07:30Z) - We're Calling an Intervention: Exploring the Fundamental Hurdles in Adapting Language Models to Nonstandard Text [8.956635443376527]
We present a suite of experiments that allow us to understand the underlying challenges of language model adaptation to nonstandard text.
We do so by designing interventions that approximate several types of linguistic variation and their interactions with existing biases of language models.
Applying our interventions during language model adaptation with varying size and nature of training data, we gain important insights into when knowledge transfer can be successful.
arXiv Detail & Related papers (2024-04-10T18:56:53Z) - Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation [82.5217996570387]
We adapt a pre-trained language model for auto-regressive text-to-image generation.
We find that pre-trained language models offer limited help.
arXiv Detail & Related papers (2023-11-27T07:19:26Z) - Text-driven Prompt Generation for Vision-Language Models in Federated
Learning [24.005620820818756]
Our work proposes Federated Text-driven Prompt Generation (FedTPG)
FedTPG learns a unified prompt generation network across multiple remote clients in a scalable manner.
Our comprehensive empirical evaluations on nine diverse image classification datasets show that our method is superior to existing federated prompt learning methods.
arXiv Detail & Related papers (2023-10-09T19:57:24Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - Towards Open-Set Text Recognition via Label-to-Prototype Learning [18.06730376866086]
We propose a label-to-prototype learning framework to handle novel characters without retraining the model.
A lot of experiments show that our method achieves promising performance on a variety of zero-shot, close-set, and open-set text recognition datasets.
arXiv Detail & Related papers (2022-03-10T06:22:51Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z) - Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction.
Our approach enables us to massively scale up the number of character types we can effectively model.
We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.