Related papers: GVdoc: Graph-based Visual Document Classification

GVdoc: Graph-based Visual Document Classification

URL: http://arxiv.org/abs/2305.17219v1
Date: Fri, 26 May 2023 19:23:20 GMT
Title: GVdoc: Graph-based Visual Document Classification
Authors: Fnu Mohbat, Mohammed J. Zaki, Catherine Finegan-Dollak, Ashish Verma
Abstract summary: We propose GVdoc, a graph-based document classification model. Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings. We show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data.
Score: 17.350393956461783
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out-of-distribution examples. Image-based classifiers lack the text component, whereas multi-modality transformer-based models face the token serialization problem in visual documents due to their diverse layouts. They also require a lot of computing power during inference, making them impractical for many real-world applications. We propose, GVdoc, a graph-based document classification model that addresses both of these challenges. Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings. Through experiments, we show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data while retaining comparable performance on the in-distribution test set.

Related papers

Leveraging Contrastive Learning for a Similarity-Guided Tampered Document Data Generation Pipeline [6.066442015301665]
We propose a novel method for generating high-quality tampered document images.<n>We first train an auxiliary network to compare text crops, leveraging contrastive learning with a novel strategy for defining positive pairs and their corresponding negatives.<n>Using a carefully designed generation pipeline, we introduce a framework capable of producing diverse, high-quality tampered document images.
arXiv Detail & Related papers (2026-02-19T12:39:38Z)
Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy. We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels. Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z)
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation [14.511401955827875]
Object detection in documents is a key step to automate the structural elements identification process. We present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image.
arXiv Detail & Related papers (2024-02-17T23:08:32Z)
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model. We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z)
Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models. We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z)
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation [15.953725529361874]
Document layout analysis is a known problem to the documents research community. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches.
arXiv Detail & Related papers (2023-05-01T12:47:55Z)
Text Representation Enrichment Utilizing Graph based Approaches: Stock Market Technical Analysis Case Study [0.0]
We propose a transductive hybrid approach composed of an unsupervised node representation learning model followed by a node classification/edge prediction model. The proposed model is developed to classify stock market technical analysis reports, which to our knowledge is the first work in this domain.
arXiv Detail & Related papers (2022-11-29T11:26:08Z)
Similarity-aware Positive Instance Sampling for Graph Contrastive Pre-training [82.68805025636165]
We propose to select positive graph instances directly from existing graphs in the training set. Our selection is based on certain domain-specific pair-wise similarity measurements. Besides, we develop an adaptive node-level pre-training method to dynamically mask nodes to distribute them evenly in the graph.
arXiv Detail & Related papers (2022-06-23T20:12:51Z)
A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search. We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z)
Test-Time Adaptation for Visual Document Understanding [34.79168501080629]
DocTTA is a novel test-time adaptation method for documents. It does source-free domain adaptation using unlabeled target document data. We introduce new benchmarks using existing public datasets for various VDU tasks.
arXiv Detail & Related papers (2022-06-15T01:57:12Z)
Temporal Graph Network Embedding with Causal Anonymous Walks Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network. For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings. We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z)
Robust Document Representations using Latent Topics and Metadata [17.306088038339336]
We propose a novel approach to fine-tuning a pre-trained neural language model for document classification problems. We generate document representations that capture both text and metadata artifacts in a task manner. Our solution also incorporates metadata explicitly rather than just augmenting them with text.
arXiv Detail & Related papers (2020-10-23T21:52:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.