Related papers: A Graphical Approach to Document Layout Analysis

A Graphical Approach to Document Layout Analysis

URL: http://arxiv.org/abs/2308.02051v1
Date: Thu, 3 Aug 2023 21:09:59 GMT
Title: A Graphical Approach to Document Layout Analysis
Authors: Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, Maxim Sokolov, Vadym Barda, Delphine Vendryes, and Chris Tanner
Abstract summary: Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network.
Score: 2.5108258530670606
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular, the 4-million parameter GLAM model outperforms the leading 140M+ parameter computer vision-based model on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8 to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making GLAM a favorable engineering choice for DLA tasks.

Related papers

In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding [113.17601814293722]
We introduce ChartScope, an LVLM optimized for in-depth chart comprehension across diverse chart types.<n>We propose an efficient data generation pipeline that synthesizes paired data for a wide range of chart types.<n>We also establish ChartDQA, a new benchmark for evaluating not only question-answering at different levels but also underlying data understanding.
arXiv Detail & Related papers (2025-07-18T18:15:09Z)
BRIDGES: Bridging Graph Modality and Large Language Models within EDA Tasks [12.683482535955314]
LLM performance suffers when graphs are represented as sequential text. We introduce BRIDGES, a framework designed to incorporate graph modality into LLMs for EDA tasks. Results demonstrate 2x to 10x improvements across multiple tasks compared to text-only baselines.
arXiv Detail & Related papers (2025-04-07T15:27:32Z)
Graphy'our Data: Towards End-to-End Modeling, Exploring and Generating Report from Raw Data [5.752510084651565]
Graphy is an end-to-end platform that automates data modeling, exploration and high-quality report generation. We showcase a pre-scrapped graph of over 50,000 papers -- complete with their references -- demonstrating how Graphy facilitates the literature-survey scenario.
arXiv Detail & Related papers (2025-02-24T06:10:49Z)
Graph-based Document Structure Analysis [26.79096546002763]
We propose a novel graph-based Document Structure Analysis (gDSA) task. This task requires that model not only detects document elements but also generates spatial and logical relations in form of a graph structure. We construct a relation graph-based document structure analysis dataset (GraphDoc) with 80K document images and 4.13M relation annotations.
arXiv Detail & Related papers (2025-02-04T17:16:14Z)
An Automatic Graph Construction Framework based on Large Language Models for Recommendation [49.51799417575638]
We introduce AutoGraph, an automatic graph construction framework based on large language models for recommendation. LLMs infer the user preference and item knowledge, which is encoded as semantic vectors. Latent factors are incorporated as extra nodes to link the user/item nodes, resulting in a graph with in-depth global-view semantics.
arXiv Detail & Related papers (2024-12-24T07:51:29Z)
Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy. We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels. Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z)
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models. Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z)
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [90.98855064914379]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs. Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy. We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z)
Large Generative Graph Models [74.58859158271169]
We propose a new class of graph generative model called Large Graph Generative Model (LGGM) The pre-trained LGGM has superior zero-shot generative capability to existing graph generative models. LGGM can be easily fine-tuned with graphs from target domains and demonstrate even better performance than those directly trained from scratch.
arXiv Detail & Related papers (2024-06-07T17:41:47Z)
LLaGA: Large Language and Graph Assistant [73.71990472543027]
Large Language and Graph Assistant (LLaGA) is an innovative model to handle the complexities of graph-structured data. LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks. Our experiments show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model.
arXiv Detail & Related papers (2024-02-13T02:03:26Z)
Vision Grid Transformer for Document Layout Analysis [26.62857594455592]
We present VGT, a two-stream Vision Grid Transformer, in which Grid Transformer (GiT) is proposed and pre-trained for 2D token-level and segment-level semantic understanding. Experiment results have illustrated that the proposed VGT model achieves new state-of-the-art results on document layout analysis tasks.
arXiv Detail & Related papers (2023-08-29T02:09:56Z)
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model. We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z)
GVdoc: Graph-based Visual Document Classification [17.350393956461783]
We propose GVdoc, a graph-based document classification model. Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings. We show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data.
arXiv Detail & Related papers (2023-05-26T19:23:20Z)
Text Representation Enrichment Utilizing Graph based Approaches: Stock Market Technical Analysis Case Study [0.0]
We propose a transductive hybrid approach composed of an unsupervised node representation learning model followed by a node classification/edge prediction model. The proposed model is developed to classify stock market technical analysis reports, which to our knowledge is the first work in this domain.
arXiv Detail & Related papers (2022-11-29T11:26:08Z)
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis [2.9923891863939938]
Document layout analysis is a key requirement for high-quality PDF document conversion. Deep-learning models have proven to be very effective at layout detection and segmentation. We present textitDocLayNet, a new, publicly available, document- annotation dataset.
arXiv Detail & Related papers (2022-06-02T14:25:12Z)
Document Layout Analysis via Dynamic Residual Feature Fusion [10.670880187577778]
Document layout analysis (DLA) aims to split the document image into different interest regions and understand the role of each region. It is a challenge to build a DLA system because the training data is very limited and lacks an efficient model. We propose an end-to-end united network named Dynamic Residual Fusion Network (DRFN) for the DLA task.
arXiv Detail & Related papers (2021-04-07T02:57:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.