One-shot Key Information Extraction from Document with Deep Partial
Graph Matching
- URL: http://arxiv.org/abs/2109.13967v1
- Date: Sun, 26 Sep 2021 07:45:53 GMT
- Title: One-shot Key Information Extraction from Document with Deep Partial
Graph Matching
- Authors: Minghong Yao, Zhiguang Liu, Liangwei Wang, Houqiang Li, Liansheng
Zhuang
- Abstract summary: Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
- Score: 60.48651298832829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automating the Key Information Extraction (KIE) from documents improves
efficiency, productivity, and security in many industrial scenarios such as
rapid indexing and archiving. Many existing supervised learning methods for the
KIE task need to feed a large number of labeled samples and learn separate
models for different types of documents. However, collecting and labeling a
large dataset is time-consuming and is not a user-friendly requirement for many
cloud platforms. To overcome these challenges, we propose a deep end-to-end
trainable network for one-shot KIE using partial graph matching. Contrary to
previous methods that the learning of similarity and solving are optimized
separately, our method enables the learning of the two processes in an
end-to-end framework. Existing one-shot KIE methods are either template or
simple attention-based learning approach that struggle to handle texts that are
shifted beyond their desired positions caused by printers, as illustrated in
Fig.1. To solve this problem, we add one-to-(at most)-one constraint such that
we will find the globally optimized solution even if some texts are drifted.
Further, we design a multimodal context ensemble block to boost the performance
through fusing features of spatial, textual, and aspect representations. To
promote research of KIE, we collected and annotated a one-shot document KIE
dataset named DKIE with diverse types of images. The DKIE dataset consists of
2.5K document images captured by mobile phones in natural scenes, and it is the
largest available one-shot KIE dataset up to now. The results of experiments on
DKIE show that our method achieved state-of-the-art performance compared with
recent one-shot and supervised learning approaches. The dataset and proposed
one-shot KIE model will be released soo
Related papers
- mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding [103.05835688963947]
We propose a High-resolution DocCompressor module to compress each high-resolution document image into 324 tokens.
DocOwl2 sets a new state-of-the-art across multi-page document understanding benchmarks and reduces first token latency by more than 50%.
Compared to single-image MLLMs trained on similar data, our DocOwl2 achieves comparable single-page understanding performance with less than 20% of the visual tokens.
arXiv Detail & Related papers (2024-09-05T11:09:00Z) - DECDM: Document Enhancement using Cycle-Consistent Diffusion Models [3.3813766129849845]
We propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models.
Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models.
We also introduce simple data augmentation strategies to improve character-glyph conservation during translation.
arXiv Detail & Related papers (2023-11-16T07:16:02Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Text-Based Person Search with Limited Data [66.26504077270356]
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query.
We present a framework with two novel components to handle the problems brought by limited data.
arXiv Detail & Related papers (2021-10-20T22:20:47Z) - Spatial Dual-Modality Graph Reasoning for Key Information Extraction [31.04597531115209]
We propose an end-to-end Spatial Dual-Modality Graph Reasoning method (SDMG-R) to extract key information from unstructured document images.
We release a new dataset named WildReceipt, which is collected and annotated for the evaluation of key information extraction from document images of unseen templates in the wild.
arXiv Detail & Related papers (2021-03-26T13:46:00Z) - PICK: Processing Key Information Extraction from Documents using
Improved Graph Learning-Convolutional Networks [5.210482046387142]
Key Information Extraction from documents remains a challenge.
We introduce PICK, a framework that is effective and robust in handling complex documents layout for KIE.
Our method outperforms baselines methods by significant margins.
arXiv Detail & Related papers (2020-04-16T05:20:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.