Visual FUDGE: Form Understanding via Dynamic Graph Editing
- URL: http://arxiv.org/abs/2105.08194v1
- Date: Mon, 17 May 2021 23:18:39 GMT
- Title: Visual FUDGE: Form Understanding via Dynamic Graph Editing
- Authors: Brian Davis, Bryan Morse, Brian Price, Chris Tensmeyer, Curtis
Wiginton
- Abstract summary: The proposed FUDGE model formulates this problem on a graph of text elements.
It uses a Graph Convolutional Network to predict changes to the graph.
FUDGE is state-of-the-art on the historical NAF dataset.
- Score: 2.012425476229879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of form understanding: finding text entities and the
relationships/links between them in form images. The proposed FUDGE model
formulates this problem on a graph of text elements (the vertices) and uses a
Graph Convolutional Network to predict changes to the graph. The initial
vertices are detected text lines and do not necessarily correspond to the final
text entities, which can span multiple lines. Also, initial edges contain many
false-positive relationships. FUDGE edits the graph structure by combining text
segments (graph vertices) and pruning edges in an iterative fashion to obtain
the final text entities and relationships. While recent work in this area has
focused on leveraging large-scale pre-trained Language Models (LM), FUDGE
achieves the same level of entity linking performance on the FUNSD dataset by
learning only visual features from the (small) provided training set. FUDGE can
be applied on forms where text recognition is difficult (e.g. degraded or
historical forms) and on forms in resource-poor languages where pre-training
such LMs is challenging. FUDGE is state-of-the-art on the historical NAF
dataset.
Related papers
- A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models [33.3678293782131]
This work studies self-supervised graph learning for text-attributed graphs (TAGs)
We aim to improve view generation through language supervision.
This is driven by the prevalence of textual attributes in real applications, which complement graph structures with rich semantic information.
arXiv Detail & Related papers (2024-06-17T17:49:19Z) - TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations [15.873944819608434]
Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions.
This paper introduces a new self-supervised learning framework, Text-And-Graph Multi-View Alignment (TAGA), which integrates TAGs' structural and semantic dimensions.
Our framework demonstrates strong performance in zero-shot and few-shot scenarios across eight real-world datasets.
arXiv Detail & Related papers (2024-05-27T03:40:16Z) - Pretraining Language Models with Text-Attributed Heterogeneous Graphs [28.579509154284448]
We present a new pretraining framework for Language Models (LMs) that explicitly considers the topological and heterogeneous information in Text-Attributed Heterogeneous Graphs (TAHGs)
We propose a topology-aware pretraining task to predict nodes involved in the context graph by jointly optimizing an LM and an auxiliary heterogeneous graph neural network.
We conduct link prediction and node classification tasks on three datasets from various domains.
arXiv Detail & Related papers (2023-10-19T08:41:21Z) - Learning Multiplex Representations on Text-Attributed Graphs with One Language Model Encoder [55.24276913049635]
We propose METAG, a new framework for learning Multiplex rEpresentations on Text-Attributed Graphs.
In contrast to existing methods, METAG uses one text encoder to model the shared knowledge across relations.
We conduct experiments on nine downstream tasks in five graphs from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-10-10T14:59:22Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG)
Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP.
Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - JointGT: Graph-Text Joint Representation Learning for Text Generation
from Knowledge Graphs [44.06715423776722]
We propose a graph-text joint representation learning model called JointGT.
During encoding, we devise a structure-aware semantic aggregation module which is plugged into each Transformer layer.
We show that JointGT obtains new state-of-the-art performance on various KG-to-text datasets.
arXiv Detail & Related papers (2021-06-19T14:10:10Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - Graph Edit Distance Reward: Learning to Edit Scene Graph [69.39048809061714]
We propose a new method to edit the scene graph according to the user instructions, which has never been explored.
To be specific, in order to learn editing scene graphs as the semantics given by texts, we propose a Graph Edit Distance Reward.
In the context of text-editing image retrieval, we validate the effectiveness of our method in CSS and CRIR dataset.
arXiv Detail & Related papers (2020-08-15T04:52:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.