Related papers: Spatial ModernBERT: Spatial-Aware Transformer for Table and Key-Value Extraction in Financial Documents at Scale

Spatial ModernBERT: Spatial-Aware Transformer for Table and Key-Value Extraction in Financial Documents at Scale

URL: http://arxiv.org/abs/2507.08865v1
Date: Wed, 09 Jul 2025 14:40:40 GMT
Title: Spatial ModernBERT: Spatial-Aware Transformer for Table and Key-Value Extraction in Financial Documents at Scale
Authors: Javis AI Team, Amrendra Singh, Maulik Shah, Dharshan Sampath,
Abstract summary: We introduce Spatial ModernBERT-a transformer-based model augmented with spatial embeddings.<n>Extracting tables and key-value pairs from financial documents is essential for business such as auditing, data analytics, and automated invoice processing.
Score: 0.5062312533373298
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Extracting tables and key-value pairs from financial documents is essential for business workflows such as auditing, data analytics, and automated invoice processing. In this work, we introduce Spatial ModernBERT-a transformer-based model augmented with spatial embeddings-to accurately detect and extract tabular data and key-value fields from complex financial documents. We cast the extraction task as token classification across three heads: (1) Label Head, classifying each token as a label (e.g., PO Number, PO Date, Item Description, Quantity, Base Cost, MRP, etc.); (2) Column Head, predicting column indices; (3) Row Head, distinguishing the start of item rows and header rows. The model is pretrained on the PubTables-1M dataset, then fine-tuned on a financial document dataset, achieving robust performance through cross-entropy loss on each classification head. We propose a post-processing method to merge tokens using B-I-IB tagging, reconstruct the tabular layout, and extract key-value pairs. Empirical evaluation shows that Spatial ModernBERT effectively leverages both textual and spatial cues, facilitating highly accurate table and key-value extraction in real-world financial documents.

Related papers

Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance [54.25184684077833]
We propose an efficient and scalable method for extracting quantitative insights from unstructured financial documents.<n>Our proposed system consists of two specialized agents: the emphExtraction Agent and the emphText-to-Agent
arXiv Detail & Related papers (2025-05-25T15:45:46Z)
TabSniper: Towards Accurate Table Detection & Structure Recognition for Bank Statements [1.9461727843485295]
Existing table structure recognition approaches produce sub optimal results for long, complex tables.<n>This paper proposes TabSniper, a novel approach for efficient table detection, categorization and structure recognition from bank statements.
arXiv Detail & Related papers (2024-12-17T11:47:59Z)
Leveraging Foundation Language Models (FLMs) for Automated Cohort Extraction from Large EHR Databases [50.552056536968166]
We propose and evaluate an algorithm for automating column matching on two large, popular and publicly-accessible EHR databases.<n>Our approach achieves a high top-three accuracy of $92%$, correctly matching $12$ out of the $13$ columns of interest, when using a small, pre-trained general purpose language model.
arXiv Detail & Related papers (2024-12-16T06:19:35Z)
Evaluation of Table Representations to Answer Questions from Tables in Documents : A Case Study using 3GPP Specifications [0.650923326742559]
The representation of a table in terms of what is a relevant chunk is not obvious. Row level representations with corresponding table header information being included in every cell improves the performance of the retrieval.
arXiv Detail & Related papers (2024-08-30T04:40:35Z)
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [51.66718740300016]
TableLLM is a robust large language model (LLM) with 8 billion parameters.<n>TableLLM is purpose-built for proficiently handling data manipulation tasks.<n>We have released the model checkpoint, source code, benchmarks, and a web application for user interaction.
arXiv Detail & Related papers (2024-03-28T11:21:12Z)
AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification [14.386767741945256]
AMuRD is a novel multilingual human-annotated dataset specifically designed for information extraction from receipts. Each sample includes annotations for item names and attributes such as price, brand, and more. This detailed annotation facilitates a comprehensive understanding of each item on the receipt.
arXiv Detail & Related papers (2023-09-18T14:18:19Z)
DocILE Benchmark for Document Information Localization and Extraction [7.944448547470927]
This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly1M unlabeled documents for unsupervised pre-training.
arXiv Detail & Related papers (2023-02-11T11:32:10Z)
SpreadsheetCoder: Formula Prediction from Semi-structured Context [70.41579328458116]
We propose a BERT-based model architecture to represent the tabular context in both row-based and column-based formats. We train our model on a large dataset of spreadsheets, and demonstrate that SpreadsheetCoder achieves top-1 prediction accuracy of 42.51%. Compared to the rule-based system, SpreadsheetCoder 82% assists more users in composing formulas on Google Sheets.
arXiv Detail & Related papers (2021-06-26T11:26:27Z)
Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images [0.6299766708197884]
This research is proposing an automated table detection and tabular data extraction from financial PDF documents. We proposed a method that consists of three main processes, which are detecting table areas with a Faster R-CNN (Region-based Convolutional Neural Network) model. The excellent table detection performance of the detection model is obtained from our customized dataset.
arXiv Detail & Related papers (2021-02-20T08:21:17Z)
A Graph Representation of Semi-structured Data for Web Question Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z)
Pre-training Tasks for Embedding-based Large-scale Retrieval [68.01167604281578]
We consider the large-scale query-document retrieval problem. Given a query (e.g., a question), return the set of relevant documents from a large document corpus. We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks.
arXiv Detail & Related papers (2020-02-10T16:44:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.