One-shot Text Field Labeling using Attention and Belief Propagation for
Structure Information Extraction
- URL: http://arxiv.org/abs/2009.04153v1
- Date: Wed, 9 Sep 2020 08:11:34 GMT
- Title: One-shot Text Field Labeling using Attention and Belief Propagation for
Structure Information Extraction
- Authors: Mengli Cheng, Minghui Qiu, Xing Shi, Jun Huang, Wei Lin
- Abstract summary: We propose a novel deep end-to-end trainable approach for one-shot text field labeling.
To alleviate these problems, we proposed a novel deep end-to-end trainable approach for one-shot text field labeling.
We collected and annotated a real-world one-shot field labeling dataset.
- Score: 28.687815600404264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Structured information extraction from document images usually consists of
three steps: text detection, text recognition, and text field labeling. While
text detection and text recognition have been heavily studied and improved a
lot in literature, text field labeling is less explored and still faces many
challenges. Existing learning based methods for text labeling task usually
require a large amount of labeled examples to train a specific model for each
type of document. However, collecting large amounts of document images and
labeling them is difficult and sometimes impossible due to privacy issues.
Deploying separate models for each type of document also consumes a lot of
resources. Facing these challenges, we explore one-shot learning for the text
field labeling task. Existing one-shot learning methods for the task are mostly
rule-based and have difficulty in labeling fields in crowded regions with few
landmarks and fields consisting of multiple separate text regions. To alleviate
these problems, we proposed a novel deep end-to-end trainable approach for
one-shot text field labeling, which makes use of attention mechanism to
transfer the layout information between document images. We further applied
conditional random field on the transferred layout information for the
refinement of field labeling. We collected and annotated a real-world one-shot
field labeling dataset with a large variety of document types and conducted
extensive experiments to examine the effectiveness of the proposed model. To
stimulate research in this direction, the collected dataset and the one-shot
model will be released1.
Related papers
- Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Self-supervised Scene Text Segmentation with Object-centric Layered
Representations Augmented by Text Regions [22.090074821554754]
We propose a self-supervised scene text segmentation algorithm with layered decoupling of representations derived from the object-centric manner to segment images into texts and background.
On several public scene text datasets, our method outperforms the state-of-the-art unsupervised segmentation algorithms.
arXiv Detail & Related papers (2023-08-25T05:00:05Z) - Domain Adaptive Scene Text Detection via Subcategorization [45.580559833129165]
We study domain adaptive scene text detection, a largely neglected yet very meaningful task.
We design SCAST, a subcategory-aware self-training technique that mitigates the network overfitting and noisy pseudo labels.
SCAST achieves superior detection performance consistently across multiple public benchmarks.
arXiv Detail & Related papers (2022-12-01T09:15:43Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image
Classification and Retrieval [8.317191999275536]
This paper focuses on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval.
We employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image.
arXiv Detail & Related papers (2020-09-21T12:31:42Z) - Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z) - Let Me Choose: From Verbal Context to Font Selection [50.293897197235296]
We aim to learn associations between visual attributes of fonts and the verbal context of the texts they are typically applied to.
We introduce a new dataset, containing examples of different topics in social media posts and ads, labeled through crowd-sourcing.
arXiv Detail & Related papers (2020-05-03T17:36:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.