A Question-Answering Approach to Key Value Pair Extraction from
Form-like Document Images
- URL: http://arxiv.org/abs/2304.07957v1
- Date: Mon, 17 Apr 2023 02:55:31 GMT
- Title: A Question-Answering Approach to Key Value Pair Extraction from
Form-like Document Images
- Authors: Kai Hu, Zhuoyuan Wu, Zhuoyao Zhong, Weihong Lin, Lei Sun, Qiang Huo
- Abstract summary: We present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer.
We propose a coarse-to-fine answer prediction approach to achieve higher answer prediction accuracy.
Our proposed Ours achieves state-of-the-art results on FUNSD and XFUND datasets, outperforming the previous best-performing method by 7.2% and 13.2% in F1 score, respectively.
- Score: 8.73248722579337
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we present a new question-answering (QA) based key-value pair
extraction approach, called KVPFormer, to robustly extracting key-value
relationships between entities from form-like document images. Specifically,
KVPFormer first identifies key entities from all entities in an image with a
Transformer encoder, then takes these key entities as \textbf{questions} and
feeds them into a Transformer decoder to predict their corresponding
\textbf{answers} (i.e., value entities) in parallel. To achieve higher answer
prediction accuracy, we propose a coarse-to-fine answer prediction approach
further, which first extracts multiple answer candidates for each identified
question in the coarse stage and then selects the most likely one among these
candidates in the fine stage. In this way, the learning difficulty of answer
prediction can be effectively reduced so that the prediction accuracy can be
improved. Moreover, we introduce a spatial compatibility attention bias into
the self-attention/cross-attention mechanism for \Ours{} to better model the
spatial interactions between entities. With these new techniques, our proposed
\Ours{} achieves state-of-the-art results on FUNSD and XFUND datasets,
outperforming the previous best-performing method by 7.2\% and 13.2\% in F1
score, respectively.
Related papers
- A Manifold Representation of the Key in Vision Transformers [8.938418994111716]
This paper explores the concept of disentangling the key from the query and value, and adopting a manifold representation for the key.
Our experiments reveal that decoupling and endowing the key with a manifold structure can enhance the model's performance.
arXiv Detail & Related papers (2024-02-01T12:01:43Z) - Bridging the Domain Gaps in Context Representations for k-Nearest
Neighbor Neural Machine Translation [57.49095610777317]
$k$-Nearest neighbor machine translation ($k$NN-MT) has attracted increasing attention due to its ability to non-parametrically adapt to new translation domains.
We propose a novel approach to boost the datastore retrieval of $k$NN-MT by reconstructing the original datastore.
Our method can effectively boost the datastore retrieval and translation quality of $k$NN-MT.
arXiv Detail & Related papers (2023-05-26T03:04:42Z) - Progressive End-to-End Object Detection in Crowded Scenes [96.92416613336096]
Previous query-based detectors suffer from two drawbacks: first, multiple predictions will be inferred for a single object, typically in crowded scenes; second, the performance saturates as the depth of the decoding stage increases.
We propose a progressive predicting method to address the above issues. Specifically, we first select accepted queries to generate true positive predictions, then refine the rest noisy queries according to the previously accepted predictions.
Experiments show that our method can significantly boost the performance of query-based detectors in crowded scenes.
arXiv Detail & Related papers (2022-03-15T06:12:00Z) - MatchVIE: Exploiting Match Relevancy between Entities for Visual
Information Extraction [48.55908127994688]
We propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE)
Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics.
We introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values.
arXiv Detail & Related papers (2021-06-24T12:06:29Z) - Question Answering Infused Pre-training of General-Purpose
Contextualized Representations [70.62967781515127]
We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations.
We accomplish this goal by training a bi-encoder QA model, which independently encodes passages and questions, to match the predictions of a more accurate cross-encoder model.
We show large improvements over both RoBERTa-large and previous state-of-the-art results on zero-shot and few-shot paraphrase detection.
arXiv Detail & Related papers (2021-06-15T14:45:15Z) - Adaptive Bi-directional Attention: Exploring Multi-Granularity
Representations for Machine Reading Comprehension [29.717816161964105]
We propose a novel approach called Adaptive Bidirectional Attention, which adaptively exploits the source representations of different levels to the predictor.
Results are better than the previous state-of-the-art model by 2.5$%$ EM and 2.3$%$ F1 scores.
arXiv Detail & Related papers (2020-12-20T09:31:35Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Tradeoffs in Sentence Selection Techniques for Open-Domain Question
Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question.
We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z) - AMR Parsing via Graph-Sequence Iterative Inference [62.85003739964878]
We propose a new end-to-end model that treats AMR parsing as a series of dual decisions on the input sequence and the incrementally constructed graph.
We show that the answers to these two questions are mutually causalities.
We design a model based on iterative inference that helps achieve better answers in both perspectives, leading to greatly improved parsing accuracy.
arXiv Detail & Related papers (2020-04-12T09:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.