Open-set Text Recognition via Character-Context Decoupling
- URL: http://arxiv.org/abs/2204.05535v1
- Date: Tue, 12 Apr 2022 05:43:46 GMT
- Title: Open-set Text Recognition via Character-Context Decoupling
- Authors: Chang Liu, Chun Yang, Xu-Cheng Yin
- Abstract summary: The open-set text recognition task is an emerging challenge that requires an extra capability to cognize novel characters during evaluation.
We argue that a major cause of the limited performance for current methods is the confounding effect of contextual information over the visual information of individual characters.
A Character-Context Decoupling framework is proposed to alleviate this problem by separating contextual information and character-visual information.
- Score: 16.2819099852748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The open-set text recognition task is an emerging challenge that requires an
extra capability to cognize novel characters during evaluation. We argue that a
major cause of the limited performance for current methods is the confounding
effect of contextual information over the visual information of individual
characters. Under open-set scenarios, the intractable bias in contextual
information can be passed down to visual information, consequently impairing
the classification performance. In this paper, a Character-Context Decoupling
framework is proposed to alleviate this problem by separating contextual
information and character-visual information. Contextual information can be
decomposed into temporal information and linguistic information. Here, temporal
information that models character order and word length is isolated with a
detached temporal attention module. Linguistic information that models n-gram
and other linguistic statistics is separated with a decoupled context anchor
mechanism. A variety of quantitative and qualitative experiments show that our
method achieves promising performance on open-set, zero-shot, and close-set
text recognition datasets.
Related papers
- MARS: Paying more attention to visual attributes for text-based person search [6.438244172631555]
This paper presents a novel TBPS architecture named MARS (Mae-Attribute-Relation-Sensitive)
It enhances current state-of-the-art models by introducing two key components: a Visual Reconstruction Loss and an Attribute Loss.
Experiments on three commonly used datasets, namely CUHK-PEDES, ICFG-PEDES, and RSTPReid, report performance improvements.
arXiv Detail & Related papers (2024-07-05T06:44:43Z) - Putting Context in Context: the Impact of Discussion Structure on Text
Classification [13.15873889847739]
We propose a series of experiments on a large dataset for stance detection in English.
We evaluate the contribution of different types of contextual information.
We show that structural information can be highly beneficial to text classification but only under certain circumstances.
arXiv Detail & Related papers (2024-02-05T12:56:22Z) - Blending Reward Functions via Few Expert Demonstrations for Faithful and
Accurate Knowledge-Grounded Dialogue Generation [22.38338205905379]
We leverage reinforcement learning algorithms to overcome the above challenges by introducing a novel reward function.
Our reward function combines an accuracy metric and a faithfulness metric to provide a balanced quality judgment of generated responses.
arXiv Detail & Related papers (2023-11-02T02:42:41Z) - Enhancing Argument Structure Extraction with Efficient Leverage of
Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information.
We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information.
Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z) - Text-Only Training for Visual Storytelling [107.19873669536523]
We formulate visual storytelling as a visual-conditioned story generation problem.
We propose a text-only training method that separates the learning of cross-modality alignment and story generation.
arXiv Detail & Related papers (2023-08-17T09:32:17Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - From Two to One: A New Scene Text Recognizer with Visual Language
Modeling Network [70.47504933083218]
We propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union.
VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition.
arXiv Detail & Related papers (2021-08-22T07:56:24Z) - Improving Disentangled Text Representation Learning with
Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging.
Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text.
Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv Detail & Related papers (2020-06-01T03:36:01Z) - Salience Estimation with Multi-Attention Learning for Abstractive Text
Summarization [86.45110800123216]
In the task of text summarization, salience estimation for words, phrases or sentences is a critical component.
We propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation.
arXiv Detail & Related papers (2020-04-07T02:38:56Z) - Matching Text with Deep Mutual Information Estimation [0.0]
We present a neural approach for general-purpose text matching with deep mutual information estimation incorporated.
Our approach, Text matching with Deep Info Max (TIM), is integrated with a procedure of unsupervised learning of representations.
We evaluate our text matching approach on several tasks including natural language inference, paraphrase identification, and answer selection.
arXiv Detail & Related papers (2020-03-09T15:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.