Related papers: Spatial Information Integration in Small Language Models for Document Layout Generation and Classification

Spatial Information Integration in Small Language Models for Document Layout Generation and Classification

URL: http://arxiv.org/abs/2501.05497v1
Date: Thu, 09 Jan 2025 17:20:00 GMT
Title: Spatial Information Integration in Small Language Models for Document Layout Generation and Classification
Authors: Pablo Melendez, Clemens Havas,
Abstract summary: Document layout understanding is a field of study that analyzes the spatial arrangement of information in a document to understand its structure and layout.<n>While semi-structured data is common in everyday life (balance sheets, purchase orders, receipts), there is a lack of public datasets for training machine learning models for this type of document.<n>We propose a method to generate new, synthetic, layout information that can help overcoming this data shortage.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Document layout understanding is a field of study that analyzes the spatial arrangement of information in a document hoping to understand its structure and layout. Models such as LayoutLM (and its subsequent iterations) can understand semi-structured documents with SotA results; however, the lack of open semi-structured data is a limitation in itself. While semi-structured data is common in everyday life (balance sheets, purchase orders, receipts), there is a lack of public datasets for training machine learning models for this type of document. In this investigation we propose a method to generate new, synthetic, layout information that can help overcoming this data shortage. According to our results, the proposed method performs better than LayoutTransformer, another popular layout generation method. We also show that, in some scenarios, text classification can improve when supported by bounding box information.

Related papers

DREAM: Document Reconstruction via End-to-end Autoregressive Model [53.51754520966657]
We present an innovative autoregressive model specifically designed for document reconstruction, referred to as Document Reconstruction via End-to-end Autoregressive Model (DREAM)<n>We establish a standardized definition of the document reconstruction task, and introduce a novel Document Similarity Metric (DSM) and DocRec1K dataset for assessing the performance of the task.
arXiv Detail & Related papers (2025-07-08T09:24:07Z)
Relation-Rich Visual Document Generator for Visual Information Extraction [12.4941229258054]
We propose a Relation-rIch visual Document GEnerator (RIDGE) that addresses these limitations through a two-stage approach. Our method significantly enhances the performance of document understanding models on various VIE benchmarks.
arXiv Detail & Related papers (2025-04-14T19:19:26Z)
Diachronic Document Dataset for Semantic Layout Analysis [9.145289299764991]
This dataset includes 7,254 annotated pages spanning a large temporal range (1600-2024) of digitised and born-digital materials. By incorporating content from different periods and genres, it addresses varying layout complexities and historical changes in document structure. We evaluate object detection models on this dataset, examining the impact of input size and subset-based training.
arXiv Detail & Related papers (2024-11-15T09:33:13Z)
Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents [31.434507306952458]
We propose KNN-former, which incorporates a new kind of bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We also use matching spatial to address the one-to-one mapping property that exists in many documents. Our method is highly-efficient compared to existing approaches in terms of the number of trainable parameters.
arXiv Detail & Related papers (2024-05-08T10:10:38Z)
LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain [2.3999111269325266]
Large Language Models (LLMs) have enabled the creation of customized text output satisfying user requests. We propose a novel approach that combines the LLMs with prompt engineering and multi-agent systems for generating new documents compliant with a desired structure.
arXiv Detail & Related papers (2024-02-21T13:54:53Z)
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model. We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z)
Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning [114.54944761345594]
We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods. Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2023-08-10T03:09:12Z)
Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents [54.744701806413204]
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. We test whether layout-infused LMs are robust to layout distribution shifts.
arXiv Detail & Related papers (2023-06-01T18:01:33Z)
Transforming Unstructured Text into Data with Context Rule Assisted Machine Learning (CRAML) [0.0]
The Context Rule Assisted Machine Learning (CRAML) method allows accurate and reproducible labeling of massive volumes of unstructured text. CRAML enables domain experts to access uncommon constructs buried within a document corpus. We present three use cases for CRAML: we analyze recent management literature that draws from text data, describe and release new machine learning models from an analysis of proprietary job advertisement text, and present findings of social and economic interest from a public corpus of franchise documents.
arXiv Detail & Related papers (2023-01-20T13:12:35Z)
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement. We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents. Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z)
RDU: A Region-based Approach to Form-style Document Understanding [69.29541701576858]
Key Information Extraction (KIE) is aimed at extracting structured information from form-style documents. We develop a new KIE model named Region-based Understanding Document (RDU) RDU takes as input the text content and corresponding coordinates of a document, and tries to predict the result by localizing a bounding-box-like region.
arXiv Detail & Related papers (2022-06-14T14:47:48Z)
Value Retrieval with Arbitrary Queries for Form-like Documents [50.5532781148902]
We propose value retrieval with arbitrary queries for form-like documents. Our method predicts target value for an arbitrary query based on the understanding of layout and semantics of a form. We propose a simple document language modeling (simpleDLM) strategy to improve document understanding on large-scale model pre-training.
arXiv Detail & Related papers (2021-12-15T01:12:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.