UniVIE: A Unified Label Space Approach to Visual Information Extraction
from Form-like Documents
- URL: http://arxiv.org/abs/2401.09220v1
- Date: Wed, 17 Jan 2024 14:02:36 GMT
- Title: UniVIE: A Unified Label Space Approach to Visual Information Extraction
from Form-like Documents
- Authors: Kai Hu, Jiawei Wang, Weihong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo
- Abstract summary: We present a new perspective, reframing VIE as a relation prediction problem and unifying labels of different tasks into a single label space.
This unified approach allows for the definition of various relation types and effectively tackles hierarchical relationships in form-like documents.
We present UniVIE, a unified model that addresses the VIE problem comprehensively.
- Score: 11.761942458294136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing methods for Visual Information Extraction (VIE) from form-like
documents typically fragment the process into separate subtasks, such as key
information extraction, key-value pair extraction, and choice group extraction.
However, these approaches often overlook the hierarchical structure of form
documents, including hierarchical key-value pairs and hierarchical choice
groups. To address these limitations, we present a new perspective, reframing
VIE as a relation prediction problem and unifying labels of different tasks
into a single label space. This unified approach allows for the definition of
various relation types and effectively tackles hierarchical relationships in
form-like documents. In line with this perspective, we present UniVIE, a
unified model that addresses the VIE problem comprehensively. UniVIE functions
using a coarse-to-fine strategy. It initially generates tree proposals through
a tree proposal network, which are subsequently refined into hierarchical trees
by a relation decoder module. To enhance the relation prediction capabilities
of UniVIE, we incorporate two novel tree constraints into the relation decoder:
a tree attention mask and a tree level embedding. Extensive experimental
evaluations on both our in-house dataset HierForms and a publicly available
dataset SIBR, substantiate that our method achieves state-of-the-art results,
underscoring the effectiveness and potential of our unified approach in
advancing the field of VIE.
Related papers
- Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document Classification [20.434941308959786]
Long document classification presents challenges due to their extensive content and complex structure.
Existing methods often struggle with token limits and fail to adequately model hierarchical relationships within documents.
Our approach integrates syntax trees for sentence encodings and document graphs for document encodings, which capture fine-grained syntactic relationships and broader document contexts.
arXiv Detail & Related papers (2024-10-03T19:25:01Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Tree Variational Autoencoders [5.992683455757179]
We propose a new generative hierarchical clustering model that learns a flexible tree-based posterior distribution over latent variables.
TreeVAE hierarchically divides samples according to their intrinsic characteristics, shedding light on hidden structures in the data.
arXiv Detail & Related papers (2023-06-15T09:25:04Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Learnable Pillar-based Re-ranking for Image-Text Retrieval [119.9979224297237]
Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities.
Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks.
We propose a novel learnable pillar-based re-ranking paradigm for image-text retrieval.
arXiv Detail & Related papers (2023-04-25T04:33:27Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Hierarchical Relationships: A New Perspective to Enhance Scene Graph
Generation [8.28849026314542]
This paper presents a finding that leveraging the hierarchical structures among labels for relationships and objects can substantially improve the performance of scene graph generation systems.
We introduce a Bayesian prediction head to jointly predict the super-category of relationships between a pair of object instances.
Experiments on the Visual Genome dataset show its strong performance, particularly in predicate classifications and zero-shot settings.
arXiv Detail & Related papers (2023-03-13T04:16:42Z) - ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - MatchVIE: Exploiting Match Relevancy between Entities for Visual
Information Extraction [48.55908127994688]
We propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE)
Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics.
We introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values.
arXiv Detail & Related papers (2021-06-24T12:06:29Z) - Interactive Steering of Hierarchical Clustering [30.371250297444703]
We present an interactive steering method to visually supervise constrained hierarchical clustering by utilizing both public knowledge (e.g., Wikipedia) and private knowledge from users.
The novelty of our approach includes 1) automatically constructing constraints for hierarchical clustering using knowledge (knowledge-driven) and intrinsic data distribution (data-driven)
To clearly convey the hierarchical clustering results, an uncertainty-aware tree visualization has been developed to enable users to quickly locate the most uncertain sub-hierarchies.
arXiv Detail & Related papers (2020-09-21T05:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.