LongFin: A Multimodal Document Understanding Model for Long Financial
Domain Documents
- URL: http://arxiv.org/abs/2401.15050v1
- Date: Fri, 26 Jan 2024 18:23:45 GMT
- Title: LongFin: A Multimodal Document Understanding Model for Long Financial
Domain Documents
- Authors: Ahmed Masry and Amir Hajian
- Abstract summary: We introduce LongFin, a multimodal document AI model capable of encoding up to 4K tokens.
We also propose the LongForms dataset that encapsulates several industrial challenges in financial documents.
- Score: 4.924255992661131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Document AI is a growing research field that focuses on the comprehension and
extraction of information from scanned and digital documents to make everyday
business operations more efficient. Numerous downstream tasks and datasets have
been introduced to facilitate the training of AI models capable of parsing and
extracting information from various document types such as receipts and scanned
forms. Despite these advancements, both existing datasets and models fail to
address critical challenges that arise in industrial contexts. Existing
datasets primarily comprise short documents consisting of a single page, while
existing models are constrained by a limited maximum length, often set at 512
tokens. Consequently, the practical application of these methods in financial
services, where documents can span multiple pages, is severely impeded. To
overcome these challenges, we introduce LongFin, a multimodal document AI model
capable of encoding up to 4K tokens. We also propose the LongForms dataset, a
comprehensive financial dataset that encapsulates several industrial challenges
in financial documents. Through an extensive evaluation, we demonstrate the
effectiveness of the LongFin model on the LongForms dataset, surpassing the
performance of existing public models while maintaining comparable results on
existing single-page benchmarks.
Related papers
- M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework [75.95430061891828]
We introduce M-LongDoc, a benchmark of 851 samples, and an automated framework to evaluate the performance of large multimodal models.
We propose a retrieval-aware tuning approach for efficient and effective multimodal document reading.
arXiv Detail & Related papers (2024-11-09T13:30:38Z) - CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation [51.2289822267563]
We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets.
We use large-scale public web-crawled corpora and similarity-based document retrieval to find other relevant human-written documents.
We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks.
arXiv Detail & Related papers (2024-09-03T17:54:40Z) - SEC-QA: A Systematic Evaluation Corpus for Financial QA [12.279234447220155]
Existing datasets are often constrained by size, context, or relevance to practical applications.
We propose SEC-QA, a continuous dataset generation framework with two key features.
We introduce a QA system based on program-of-thought that improves the ability to perform complex information retrieval and quantitative reasoning pipelines.
arXiv Detail & Related papers (2024-06-20T15:12:41Z) - DocFinQA: A Long-Context Financial Reasoning Dataset [17.752081303855263]
We introduce a long-document financial QA task.
We extend the average context length from under 700 words in FinQA to 123k words in DocFinQA.
We conduct extensive experiments over retrieval-based QA pipelines and long-context language models.
arXiv Detail & Related papers (2024-01-12T22:19:22Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - Multimodal Document Analytics for Banking Process Automation [4.541582055558865]
The paper contributes original empirical evidence on the effectiveness and efficiency of multi-model models for document processing in the banking business.
It offers practical guidance on how to unlock this potential in day-to-day operations.
arXiv Detail & Related papers (2023-07-21T18:29:04Z) - DocumentNet: Bridging the Data Gap in Document Pre-Training [78.01647768018485]
We propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models.
The collected dataset, named DocumentNet, does not depend on specific document types or entity sets.
Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training.
arXiv Detail & Related papers (2023-06-15T08:21:15Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - FETILDA: An Effective Framework For Fin-tuned Embeddings For Long
Financial Text Documents [14.269860621624394]
We propose and implement a deep learning framework that splits long documents into chunks and utilize pre-trained LMs to process and aggregate the chunks into vector representations.
We evaluate our framework on a collection of 10-K public disclosure reports from US banks, and another dataset of reports submitted by US companies.
arXiv Detail & Related papers (2022-06-14T16:14:14Z) - MuLD: The Multitask Long Document Benchmark [4.835289158553091]
We present a new long document benchmark consisting of only documents over 10,000 tokens.
We show that models with increased context length are better able to solve the tasks presented.
arXiv Detail & Related papers (2022-02-15T12:42:55Z) - SciREX: A Challenge Dataset for Document-Level Information Extraction [56.83748634747753]
It is challenging to create a large-scale information extraction dataset at the document level.
We introduce SciREX, a document level IE dataset that encompasses multiple IE tasks.
We develop a neural model as a strong baseline that extends previous state-of-the-art IE models to document-level IE.
arXiv Detail & Related papers (2020-05-01T17:30:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.