GraphKD: Exploring Knowledge Distillation Towards Document Object
Detection with Structured Graph Creation
- URL: http://arxiv.org/abs/2402.11401v2
- Date: Tue, 20 Feb 2024 18:25:23 GMT
- Title: GraphKD: Exploring Knowledge Distillation Towards Document Object
Detection with Structured Graph Creation
- Authors: Ayan Banerjee, Sanket Biswas, Josep Llad\'os, and Umapada Pal
- Abstract summary: Object detection in documents is a key step to automate the structural elements identification process.
We present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image.
- Score: 14.511401955827875
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Object detection in documents is a key step to automate the structural
elements identification process in a digital or scanned document through
understanding the hierarchical structure and relationships between different
elements. Large and complex models, while achieving high accuracy, can be
computationally expensive and memory-intensive, making them impractical for
deployment on resource constrained devices. Knowledge distillation allows us to
create small and more efficient models that retain much of the performance of
their larger counterparts. Here we present a graph-based knowledge distillation
framework to correctly identify and localize the document objects in a document
image. Here, we design a structured graph with nodes containing proposal-level
features and edges representing the relationship between the different proposal
regions. Also, to reduce text bias an adaptive node sampling strategy is
designed to prune the weight distribution and put more weightage on non-text
nodes. We encode the complete graph as a knowledge representation and transfer
it from the teacher to the student through the proposed distillation loss by
effectively capturing both local and global information concurrently. Extensive
experimentation on competitive benchmarks demonstrates that the proposed
framework outperforms the current state-of-the-art approaches. The code will be
available at: https://github.com/ayanban011/GraphKD.
Related papers
- Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document Classification [20.434941308959786]
Long document classification presents challenges due to their extensive content and complex structure.
Existing methods often struggle with token limits and fail to adequately model hierarchical relationships within documents.
Our approach integrates syntax trees for sentence encodings and document graphs for document encodings, which capture fine-grained syntactic relationships and broader document contexts.
arXiv Detail & Related papers (2024-10-03T19:25:01Z) - Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data.
The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z) - Enhancing Visually-Rich Document Understanding via Layout Structure
Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model.
We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z) - SelfDocSeg: A Self-Supervised vision-based Approach towards Document
Segmentation [15.953725529361874]
Document layout analysis is a known problem to the documents research community.
With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain.
We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches.
arXiv Detail & Related papers (2023-05-01T12:47:55Z) - Document-level Relation Extraction with Cross-sentence Reasoning Graph [14.106582119686635]
Relation extraction (RE) has recently moved from the sentence-level to document-level.
We propose a novel document-level RE model with a GRaph information Aggregation and Cross-sentence Reasoning network (GRACR)
Experimental results show GRACR achieves excellent performance on two public datasets of document-level RE.
arXiv Detail & Related papers (2023-03-07T14:14:12Z) - FactGraph: Evaluating Factuality in Summarization with Semantic Graph
Representations [114.94628499698096]
We propose FactGraph, a method that decomposes the document and the summary into structured meaning representations (MRs)
MRs describe core semantic concepts and their relations, aggregating the main content in both document and summary in a canonical form, and reducing data sparsity.
Experiments on different benchmarks for evaluating factuality show that FactGraph outperforms previous approaches by up to 15%.
arXiv Detail & Related papers (2022-04-13T16:45:33Z) - A Multi-purposed Unsupervised Framework for Comparing Embeddings of
Undirected and Directed Graphs [0.0]
We extend the framework for evaluating graph embeddings that was recently introduced by the authors.
A good embedding should capture the underlying graph topology and structure, node-to-node relationship, and other relevant information.
The framework is flexible, scalable, and can deal with undirected/directed, weighted/unweighted graphs.
arXiv Detail & Related papers (2021-11-30T20:20:30Z) - Self-supervised Graph-level Representation Learning with Local and
Global Structure [71.45196938842608]
We propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning.
Besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters.
An efficient online expectation-maximization (EM) algorithm is further developed for learning the model.
arXiv Detail & Related papers (2021-06-08T05:25:38Z) - Coarse-to-Fine Entity Representations for Document-level Relation
Extraction [28.39444850200523]
Document-level Relation Extraction (RE) requires extracting relations expressed within and across sentences.
Recent works show that graph-based methods, usually constructing a document-level graph that captures document-aware interactions, can obtain useful entity representations.
We propose the textbfCoarse-to-textbfFine textbfEntity textbfRepresentation model (textbfCFER) that adopts a coarse-to-fine strategy.
arXiv Detail & Related papers (2020-12-04T10:18:59Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.