Related papers: GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

URL: http://arxiv.org/abs/2402.11401v2
Date: Tue, 20 Feb 2024 18:25:23 GMT
Title: GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation
Authors: Ayan Banerjee, Sanket Biswas, Josep Llad\'os, and Umapada Pal
Abstract summary: Object detection in documents is a key step to automate the structural elements identification process. We present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image.
Score: 14.511401955827875
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: https://github.com/ayanban011/GraphKD.

Related papers

Enhancing Document AI Data Generation Through Graph-Based Synthetic Layouts [0.8245350546263803]
We propose a novel approach to synthetic document layout generation using Graph Neural Networks (GNNs) By representing document elements as nodes in a graph, GNNs are trained to generate realistic and diverse document layouts. Our experimental results show that graph-augmented document layouts outperform existing augmentation techniques.
arXiv Detail & Related papers (2024-11-27T21:15:02Z)
Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document Classification [20.434941308959786]
Long document classification presents challenges due to their extensive content and complex structure. Existing methods often struggle with token limits and fail to adequately model hierarchical relationships within documents. Our approach integrates syntax trees for sentence encodings and document graphs for document encodings, which capture fine-grained syntactic relationships and broader document contexts.
arXiv Detail & Related papers (2024-10-03T19:25:01Z)
Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time. Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z)
Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data. The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z)
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model. We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z)
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation [15.953725529361874]
Document layout analysis is a known problem to the documents research community. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches.
arXiv Detail & Related papers (2023-05-01T12:47:55Z)
Document-level Relation Extraction with Cross-sentence Reasoning Graph [14.106582119686635]
Relation extraction (RE) has recently moved from the sentence-level to document-level. We propose a novel document-level RE model with a GRaph information Aggregation and Cross-sentence Reasoning network (GRACR) Experimental results show GRACR achieves excellent performance on two public datasets of document-level RE.
arXiv Detail & Related papers (2023-03-07T14:14:12Z)
FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations [114.94628499698096]
We propose FactGraph, a method that decomposes the document and the summary into structured meaning representations (MRs) MRs describe core semantic concepts and their relations, aggregating the main content in both document and summary in a canonical form, and reducing data sparsity. Experiments on different benchmarks for evaluating factuality show that FactGraph outperforms previous approaches by up to 15%.
arXiv Detail & Related papers (2022-04-13T16:45:33Z)
A Multi-purposed Unsupervised Framework for Comparing Embeddings of Undirected and Directed Graphs [0.0]
We extend the framework for evaluating graph embeddings that was recently introduced by the authors. A good embedding should capture the underlying graph topology and structure, node-to-node relationship, and other relevant information. The framework is flexible, scalable, and can deal with undirected/directed, weighted/unweighted graphs.
arXiv Detail & Related papers (2021-11-30T20:20:30Z)
Self-supervised Graph-level Representation Learning with Local and Global Structure [71.45196938842608]
We propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning. Besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters. An efficient online expectation-maximization (EM) algorithm is further developed for learning the model.
arXiv Detail & Related papers (2021-06-08T05:25:38Z)
Coarse-to-Fine Entity Representations for Document-level Relation Extraction [28.39444850200523]
Document-level Relation Extraction (RE) requires extracting relations expressed within and across sentences. Recent works show that graph-based methods, usually constructing a document-level graph that captures document-aware interactions, can obtain useful entity representations. We propose the textbfCoarse-to-textbfFine textbfEntity textbfRepresentation model (textbfCFER) that adopts a coarse-to-fine strategy.
arXiv Detail & Related papers (2020-12-04T10:18:59Z)
Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents. Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents. Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.