Related papers: Open-set Text Recognition via Character-Context Decoupling

Open-set Text Recognition via Character-Context Decoupling

URL: http://arxiv.org/abs/2204.05535v1
Date: Tue, 12 Apr 2022 05:43:46 GMT
Title: Open-set Text Recognition via Character-Context Decoupling
Authors: Chang Liu, Chun Yang, Xu-Cheng Yin
Abstract summary: The open-set text recognition task is an emerging challenge that requires an extra capability to cognize novel characters during evaluation. We argue that a major cause of the limited performance for current methods is the confounding effect of contextual information over the visual information of individual characters. A Character-Context Decoupling framework is proposed to alleviate this problem by separating contextual information and character-visual information.
Score: 16.2819099852748
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The open-set text recognition task is an emerging challenge that requires an extra capability to cognize novel characters during evaluation. We argue that a major cause of the limited performance for current methods is the confounding effect of contextual information over the visual information of individual characters. Under open-set scenarios, the intractable bias in contextual information can be passed down to visual information, consequently impairing the classification performance. In this paper, a Character-Context Decoupling framework is proposed to alleviate this problem by separating contextual information and character-visual information. Contextual information can be decomposed into temporal information and linguistic information. Here, temporal information that models character order and word length is isolated with a detached temporal attention module. Linguistic information that models n-gram and other linguistic statistics is separated with a decoupled context anchor mechanism. A variety of quantitative and qualitative experiments show that our method achieves promising performance on open-set, zero-shot, and close-set text recognition datasets.

Related papers

Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition [50.86415025650168]
Masked image modeling (MIM) tends to exploit local structures to reconstruct visual patterns, resulting in limited linguistic knowledge. We propose a Linguistics-aware Masked Image Modeling (LMIM) approach, which channels the linguistic information into the decoding process of MIM through a separate branch.
arXiv Detail & Related papers (2025-03-24T14:53:35Z)
MARS: Paying more attention to visual attributes for text-based person search [6.438244172631555]
This paper presents a novel TBPS architecture named MARS (Mae-Attribute-Relation-Sensitive) It enhances current state-of-the-art models by introducing two key components: a Visual Reconstruction Loss and an Attribute Loss. Experiments on three commonly used datasets, namely CUHK-PEDES, ICFG-PEDES, and RSTPReid, report performance improvements.
arXiv Detail & Related papers (2024-07-05T06:44:43Z)
Putting Context in Context: the Impact of Discussion Structure on Text Classification [13.15873889847739]
We propose a series of experiments on a large dataset for stance detection in English. We evaluate the contribution of different types of contextual information. We show that structural information can be highly beneficial to text classification but only under certain circumstances.
arXiv Detail & Related papers (2024-02-05T12:56:22Z)
Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation [22.38338205905379]
We leverage reinforcement learning algorithms to overcome the above challenges by introducing a novel reward function. Our reward function combines an accuracy metric and a faithfulness metric to provide a balanced quality judgment of generated responses.
arXiv Detail & Related papers (2023-11-02T02:42:41Z)
Enhancing Argument Structure Extraction with Efficient Leverage of Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information. We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information. Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z)
Text-Only Training for Visual Storytelling [107.19873669536523]
We formulate visual storytelling as a visual-conditioned story generation problem. We propose a text-only training method that separates the learning of cross-modality alignment and story generation.
arXiv Detail & Related papers (2023-08-17T09:32:17Z)
Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations. Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features. Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z)
From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network [70.47504933083218]
We propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union. VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition.
arXiv Detail & Related papers (2021-08-22T07:56:24Z)
Improving Disentangled Text Representation Learning with Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging. Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text. Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv Detail & Related papers (2020-06-01T03:36:01Z)
Salience Estimation with Multi-Attention Learning for Abstractive Text Summarization [86.45110800123216]
In the task of text summarization, salience estimation for words, phrases or sentences is a critical component. We propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation.
arXiv Detail & Related papers (2020-04-07T02:38:56Z)
Matching Text with Deep Mutual Information Estimation [0.0]
We present a neural approach for general-purpose text matching with deep mutual information estimation incorporated. Our approach, Text matching with Deep Info Max (TIM), is integrated with a procedure of unsupervised learning of representations. We evaluate our text matching approach on several tasks including natural language inference, paraphrase identification, and answer selection.
arXiv Detail & Related papers (2020-03-09T15:25:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.