On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval
- URL: http://arxiv.org/abs/2311.00693v2
- Date: Sat, 9 Dec 2023 00:21:29 GMT
- Title: On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval
- Authors: Jiayi Chen, Hanjun Dai, Bo Dai, Aidong Zhang, Wei Wei
- Abstract summary: Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
- Score: 59.25292920967197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visually-rich document entity retrieval (VDER), which extracts key
information (e.g. date, address) from document images like invoices and
receipts, has become an important topic in industrial NLP applications. The
emergence of new document types at a constant pace, each with its unique entity
types, presents a unique challenge: many documents contain unseen entity types
that occur only a couple of times. Addressing this challenge requires models to
have the ability of learning entities in a few-shot manner. However, prior
works for Few-shot VDER mainly address the problem at the document level with a
predefined global entity space, which doesn't account for the entity-level
few-shot scenario: target entity types are locally personalized by each task
and entity occurrences vary significantly among documents. To address this
unexplored scenario, this paper studies a novel entity-level few-shot VDER
task. The challenges lie in the uniqueness of the label space for each task and
the increased complexity of out-of-distribution (OOD) contents. To tackle this
novel task, we present a task-aware meta-learning based framework, with a
central focus on achieving effective task personalization that distinguishes
between in-task and out-of-task distribution. Specifically, we adopt a
hierarchical decoder (HC) and employ contrastive learning (ContrastProtoNet) to
achieve this goal. Furthermore, we introduce a new dataset, FewVEX, to boost
future research in the field of entity-level few-shot VDER. Experimental
results demonstrate our approaches significantly improve the robustness of
popular meta-learning baselines.
Related papers
- BuDDIE: A Business Document Dataset for Multi-task Information Extraction [18.440587946049845]
BuDDIE is the first multi-task dataset of 1,665 real-world business documents.
Our dataset consists of publicly available business entity documents from US state government websites.
arXiv Detail & Related papers (2024-04-05T10:26:42Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - DocumentNet: Bridging the Data Gap in Document Pre-Training [78.01647768018485]
We propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models.
The collected dataset, named DocumentNet, does not depend on specific document types or entity sets.
Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training.
arXiv Detail & Related papers (2023-06-15T08:21:15Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - VRDU: A Benchmark for Visually-rich Document Understanding [22.040372755535767]
We identify the desiderata for a more comprehensive benchmark and propose one we call Visually Rich Document Understanding (VRDU)
VRDU contains two datasets that represent several challenges: rich schema including diverse data types as well as hierarchical entities, complex templates including tables and multi-column layouts, and diversity of different layouts (templates) within a single document type.
We design few-shot and conventional experiment settings along with a carefully designed matching algorithm to evaluate extraction results.
arXiv Detail & Related papers (2022-11-15T03:17:07Z) - FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue.
FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer.
We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z) - WSL-DS: Weakly Supervised Learning with Distant Supervision for Query
Focused Multi-Document Abstractive Summarization [16.048329028104643]
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents.
One major challenge for this task is the lack of availability of labeled training datasets.
We propose a novel weakly supervised learning approach via utilizing distant supervision.
arXiv Detail & Related papers (2020-11-03T02:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.