A Graphical Approach to Document Layout Analysis
- URL: http://arxiv.org/abs/2308.02051v1
- Date: Thu, 3 Aug 2023 21:09:59 GMT
- Title: A Graphical Approach to Document Layout Analysis
- Authors: Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, Maxim
Sokolov, Vadym Barda, Delphine Vendryes, and Chris Tanner
- Abstract summary: Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document.
Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs.
We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network.
- Score: 2.5108258530670606
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Document layout analysis (DLA) is the task of detecting the distinct,
semantic content within a document and correctly classifying these items into
an appropriate category (e.g., text, title, figure). DLA pipelines enable users
to convert documents into structured machine-readable formats that can then be
used for many useful downstream tasks. Most existing state-of-the-art (SOTA)
DLA models represent documents as images, discarding the rich metadata
available in electronically generated PDFs. Directly leveraging this metadata,
we represent each PDF page as a structured graph and frame the DLA problem as a
graph segmentation and classification problem. We introduce the Graph-based
Layout Analysis Model (GLAM), a lightweight graph neural network competitive
with SOTA models on two challenging DLA datasets - while being an order of
magnitude smaller than existing models. In particular, the 4-million parameter
GLAM model outperforms the leading 140M+ parameter computer vision-based model
on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two
models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8
to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making
GLAM a favorable engineering choice for DLA tasks.
Related papers
- Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy.
We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels.
Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [90.98855064914379]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs.
Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy.
We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z) - Large Generative Graph Models [74.58859158271169]
We propose a new class of graph generative model called Large Graph Generative Model (LGGM)
The pre-trained LGGM has superior zero-shot generative capability to existing graph generative models.
LGGM can be easily fine-tuned with graphs from target domains and demonstrate even better performance than those directly trained from scratch.
arXiv Detail & Related papers (2024-06-07T17:41:47Z) - LLaGA: Large Language and Graph Assistant [73.71990472543027]
Large Language and Graph Assistant (LLaGA) is an innovative model to handle the complexities of graph-structured data.
LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks.
Our experiments show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model.
arXiv Detail & Related papers (2024-02-13T02:03:26Z) - Vision Grid Transformer for Document Layout Analysis [26.62857594455592]
We present VGT, a two-stream Vision Grid Transformer, in which Grid Transformer (GiT) is proposed and pre-trained for 2D token-level and segment-level semantic understanding.
Experiment results have illustrated that the proposed VGT model achieves new state-of-the-art results on document layout analysis tasks.
arXiv Detail & Related papers (2023-08-29T02:09:56Z) - Enhancing Visually-Rich Document Understanding via Layout Structure
Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model.
We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z) - GVdoc: Graph-based Visual Document Classification [17.350393956461783]
We propose GVdoc, a graph-based document classification model.
Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings.
We show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data.
arXiv Detail & Related papers (2023-05-26T19:23:20Z) - Text Representation Enrichment Utilizing Graph based Approaches: Stock
Market Technical Analysis Case Study [0.0]
We propose a transductive hybrid approach composed of an unsupervised node representation learning model followed by a node classification/edge prediction model.
The proposed model is developed to classify stock market technical analysis reports, which to our knowledge is the first work in this domain.
arXiv Detail & Related papers (2022-11-29T11:26:08Z) - DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis [2.9923891863939938]
Document layout analysis is a key requirement for high-quality PDF document conversion.
Deep-learning models have proven to be very effective at layout detection and segmentation.
We present textitDocLayNet, a new, publicly available, document- annotation dataset.
arXiv Detail & Related papers (2022-06-02T14:25:12Z) - Document Layout Analysis via Dynamic Residual Feature Fusion [10.670880187577778]
Document layout analysis (DLA) aims to split the document image into different interest regions and understand the role of each region.
It is a challenge to build a DLA system because the training data is very limited and lacks an efficient model.
We propose an end-to-end united network named Dynamic Residual Fusion Network (DRFN) for the DLA task.
arXiv Detail & Related papers (2021-04-07T02:57:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.