Related papers: Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

URL: http://arxiv.org/abs/2208.11168v1
Date: Tue, 23 Aug 2022 19:48:10 GMT
Title: Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks
Authors: Andrea Gemelli and Sanket Biswas and Enrico Civitelli and Josep Llad\'os and Simone Marinai
Abstract summary: We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model. We evaluate our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection.
Score: 0.965964228590342
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection. Our code is freely accessible on https://github.com/andreagemelli/doc2graph.

Related papers

Graph-based Document Structure Analysis [26.79096546002763]
We propose a novel graph-based Document Structure Analysis (gDSA) task. This task requires that model not only detects document elements but also generates spatial and logical relations in form of a graph structure. We construct a relation graph-based document structure analysis dataset (GraphDoc) with 80K document images and 4.13M relation annotations.
arXiv Detail & Related papers (2025-02-04T17:16:14Z)
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees [50.78679002846741]
We introduce a novel approach for learning cross-task generalities in graphs. We propose task-trees as basic learning instances to align task spaces on graphs. Our findings indicate that when a graph neural network is pretrained on diverse task-trees, it acquires transferable knowledge.
arXiv Detail & Related papers (2024-12-21T02:07:43Z)
iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models [0.7165255458140439]
iText2KG is a method for incremental, topic-independent Knowledge Graph construction without post-processing. Our method demonstrates superior performance compared to baseline methods across three scenarios.
arXiv Detail & Related papers (2024-09-05T06:49:14Z)
Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models [11.959445364035734]
80% of enterprise data reside in unstructured files, stored in data lakes that accommodate heterogeneous formats. We introduce Docs2KG, a novel framework designed to extract multimodal information from diverse and heterogeneous documents. Docs2KG generates a unified knowledge graph that represents the extracted key information.
arXiv Detail & Related papers (2024-06-05T05:35:59Z)
DocGraphLM: Documental Graph Language Model for Information Extraction [15.649726614383388]
We introduce DocGraphLM, a framework that combines pre-trained language models with graph semantics. To achieve this, we propose 1) a joint encoder architecture to represent documents, and 2) a novel link prediction approach to reconstruct document graphs. Our experiments on three SotA datasets show consistent improvement on IE and QA tasks with the adoption of graph features.
arXiv Detail & Related papers (2024-01-05T14:15:36Z)
One for All: Towards Training One Graph Model for All Classification Tasks [61.656962278497225]
A unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain. We propose textbfOne for All (OFA), the first general framework that can use a single graph model to address the above challenges. OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.
arXiv Detail & Related papers (2023-09-29T21:15:26Z)
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases [63.96793270418793]
Complex logical query answering (CLQA) is a recently emerged task of graph machine learning. We introduce the concept of Neural Graph Database (NGDBs) NGDB consists of a Neural Graph Storage and a Neural Graph Engine.
arXiv Detail & Related papers (2023-03-26T04:03:37Z)
GRATIS: Deep Learning Graph Representation with Task-specific Topology and Multi-dimensional Edge Features [27.84193444151138]
We propose the first general graph representation learning framework (called GRATIS) It can generate a strong graph representation with a task-specific topology and task-specific multi-dimensional edge features from any arbitrary input. Our framework is effective, robust and flexible, and is a plug-and-play module that can be combined with different backbones and Graph Neural Networks (GNNs)
arXiv Detail & Related papers (2022-11-19T18:42:55Z)
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search [96.31315520244605]
Arch-Graph is a transferable NAS method that predicts task-specific optimal architectures. We show Arch-Graph's transferability and high sample efficiency across numerous tasks. It is able to find top 0.16% and 0.29% architectures on average on two search spaces under the budget of only 50 models.
arXiv Detail & Related papers (2022-04-12T16:46:06Z)
Multimodal Pre-training Based on Graph Attention Network for Document Understanding [32.55734039518983]
GraphDoc is a graph-based model for various document understanding tasks. It is pre-trained in a multimodal framework by utilizing text, layout, and image information simultaneously. It learns a generic representation from only 320k unlabeled documents.
arXiv Detail & Related papers (2022-03-25T09:27:50Z)
Extracting Summary Knowledge Graphs from Long Documents [48.92130466606231]
We introduce a new text-to-graph task of predicting summarized knowledge graphs from long documents. We develop a dataset of 200k document/graph pairs using automatic and human annotations.
arXiv Detail & Related papers (2020-09-19T04:37:33Z)
ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge. We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text. We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
Semantic Graphs for Generating Deep Questions [98.5161888878238]
We propose a novel framework which first constructs a semantic-level graph for the input document and then encodes the semantic graph by introducing an attention-based GGNN (Att-GGNN) On the HotpotQA deep-question centric dataset, our model greatly improves performance over questions requiring reasoning over multiple facts, leading to state-of-the-art performance.
arXiv Detail & Related papers (2020-04-27T10:52:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.