Multimodal Document Analytics for Banking Process Automation
- URL: http://arxiv.org/abs/2307.11845v2
- Date: Sun, 26 Nov 2023 08:57:44 GMT
- Title: Multimodal Document Analytics for Banking Process Automation
- Authors: Christopher Gerling, Stefan Lessmann
- Abstract summary: The paper contributes original empirical evidence on the effectiveness and efficiency of multi-model models for document processing in the banking business.
It offers practical guidance on how to unlock this potential in day-to-day operations.
- Score: 4.541582055558865
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Traditional banks face increasing competition from FinTechs in the rapidly
evolving financial ecosystem. Raising operational efficiency is vital to
address this challenge. Our study aims to improve the efficiency of
document-intensive business processes in banking. To that end, we first review
the landscape of business documents in the retail segment. Banking documents
often contain text, layout, and visuals, suggesting that document analytics and
process automation require more than plain natural language processing (NLP).
To verify this and assess the incremental value of visual cues when processing
business documents, we compare a recently proposed multimodal model called
LayoutXLM to powerful text classifiers (e.g., BERT) and large language models
(e.g., GPT) in a case study related to processing company register extracts.
The results confirm that incorporating layout information in a model
substantially increases its performance. Interestingly, we also observed that
more than 75% of the best model performance (in terms of the F1 score) can be
achieved with as little as 30% of the training data. This shows that the demand
for data labeled data to set up a multi-modal model can be moderate, which
simplifies real-world applications of multimodal document analytics. Our study
also sheds light on more specific practices in the scope of calibrating a
multimodal banking document classifier, including the need for fine-tuning. In
sum, the paper contributes original empirical evidence on the effectiveness and
efficiency of multi-model models for document processing in the banking
business and offers practical guidance on how to unlock this potential in
day-to-day operations.
Related papers
- MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking [0.283600654802951]
We present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal datasets.
We introduce a dynamic perceiver-based model that can handle inputs from multiple modalities of arbitrary lengths.
Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset.
arXiv Detail & Related papers (2024-07-18T01:33:20Z) - LongFin: A Multimodal Document Understanding Model for Long Financial
Domain Documents [4.924255992661131]
We introduce LongFin, a multimodal document AI model capable of encoding up to 4K tokens.
We also propose the LongForms dataset that encapsulates several industrial challenges in financial documents.
arXiv Detail & Related papers (2024-01-26T18:23:45Z) - On Task-personalized Multimodal Few-shot Learning for Visually-rich
Document Entity Retrieval [59.25292920967197]
Few-shot document entity retrieval (VDER) is an important topic in industrial NLP applications.
FewVEX is a new dataset to boost future research in the field of entity-level few-shot VDER.
We present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization.
arXiv Detail & Related papers (2023-11-01T17:51:43Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking [78.2700757742992]
Task-oriented dialogue systems often employ a Dialogue State Tracker (DST) to successfully complete conversations.
Recent state-of-the-art DST implementations rely on schemata of diverse services to improve model robustness.
We propose a single multi-task BERT-based model that jointly solves the three DST tasks of intent prediction, requested slot prediction and slot filling.
arXiv Detail & Related papers (2022-07-02T13:27:59Z) - FETILDA: An Effective Framework For Fin-tuned Embeddings For Long
Financial Text Documents [14.269860621624394]
We propose and implement a deep learning framework that splits long documents into chunks and utilize pre-trained LMs to process and aggregate the chunks into vector representations.
We evaluate our framework on a collection of 10-K public disclosure reports from US banks, and another dataset of reports submitted by US companies.
arXiv Detail & Related papers (2022-06-14T16:14:14Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Data-Efficient Information Extraction from Form-Like Documents [14.567098292973075]
Key challenge is that form-like documents can be laid out in virtually infinitely many ways.
Data efficiency is critical to enable information extraction systems to scale to handle hundreds of different document-types.
arXiv Detail & Related papers (2022-01-07T19:16:49Z) - Single-Modal Entropy based Active Learning for Visual Question Answering [75.1682163844354]
We address Active Learning in the multi-modal setting of Visual Question Answering (VQA)
In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition.
Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks.
arXiv Detail & Related papers (2021-10-21T05:38:45Z) - An Intelligent Hybrid Model for Identity Document Classification [0.0]
Digitization may provide opportunities (e.g., increase in productivity, disaster recovery, and environmentally friendly solutions) and challenges for businesses.
One of the main challenges would be to accurately classify numerous scanned documents uploaded every day by customers.
There are not many studies available to address the challenge as an application of image classification.
The proposed approach has been implemented using Python and experimentally validated on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-06-07T13:08:00Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.