EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity
Information Extraction from Document Images
- URL: http://arxiv.org/abs/2311.13993v1
- Date: Thu, 23 Nov 2023 13:20:42 GMT
- Title: EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity
Information Extraction from Document Images
- Authors: Abhishek Singh, Venkatapathy Subramanian, Ayush Maheshwari, Pradeep
Narayan, Devi Prasad Shetty, Ganesh Ramakrishnan
- Abstract summary: Information Extraction from document images is challenging due to the high variability of layout formats.
We propose a novel approach, EIGEN, which combines rule-based methods with deep learning models using data programming approaches.
We empirically show that our EIGEN framework can significantly improve the performance of state-of-the-art deep models with the availability of very few labeled data instances.
- Score: 27.36816896426097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information Extraction (IE) from document images is challenging due to the
high variability of layout formats. Deep models such as LayoutLM and BROS have
been proposed to address this problem and have shown promising results.
However, they still require a large amount of field-level annotations for
training these models. Other approaches using rule-based methods have also been
proposed based on the understanding of the layout and semantics of a form such
as geometric position, or type of the fields, etc. In this work, we propose a
novel approach, EIGEN (Expert-Informed Joint Learning aGgrEatioN), which
combines rule-based methods with deep learning models using data programming
approaches to circumvent the requirement of annotation of large amounts of
training data. Specifically, EIGEN consolidates weak labels induced from
multiple heuristics through generative models and use them along with a small
number of annotated labels to jointly train a deep model. In our framework, we
propose the use of labeling functions that include incorporating contextual
information thus capturing the visual and language context of a word for
accurate categorization. We empirically show that our EIGEN framework can
significantly improve the performance of state-of-the-art deep models with the
availability of very few labeled data instances. The source code is available
at
https://github.com/ayushayush591/EIGEN-High-Fidelity-Extraction-Document-Images.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment via Neighborhood Partitioning and Generative Subgraph Encoding [39.67113788660731]
We introduce a framework for developing Graph-aligned LAnguage Models (GLaM)
We demonstrate that grounding the models in specific graph-based knowledge expands the models' capacity for structure-based reasoning.
arXiv Detail & Related papers (2024-02-09T19:53:29Z) - DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - A Multi-Format Transfer Learning Model for Event Argument Extraction via
Variational Information Bottleneck [68.61583160269664]
Event argument extraction (EAE) aims to extract arguments with given roles from texts.
We propose a multi-format transfer learning model with variational information bottleneck.
We conduct extensive experiments on three benchmark datasets, and obtain new state-of-the-art performance on EAE.
arXiv Detail & Related papers (2022-08-27T13:52:01Z) - Representing Knowledge by Spans: A Knowledge-Enhanced Model for
Information Extraction [7.077412533545456]
We propose a new pre-trained model that learns representations of both entities and relationships simultaneously.
By encoding spans efficiently with span modules, our model can represent both entities and their relationships but requires fewer parameters than existing models.
arXiv Detail & Related papers (2022-08-20T07:32:25Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z) - G2MF-WA: Geometric Multi-Model Fitting with Weakly Annotated Data [15.499276649167975]
In weak annotating, most of the manual annotations are supposed to be correct yet inevitably mixed with incorrect ones.
We propose a novel method to make full use of the WA data to boost the multi-model fitting performance.
arXiv Detail & Related papers (2020-01-20T04:22:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.