Contextual Graph Transformer: A Small Language Model for Enhanced Engineering Document Information Extraction
- URL: http://arxiv.org/abs/2508.02532v1
- Date: Mon, 04 Aug 2025 15:41:35 GMT
- Title: Contextual Graph Transformer: A Small Language Model for Enhanced Engineering Document Information Extraction
- Authors: Karan Reddy, Mayukha Pal,
- Abstract summary: Contextual Graph Transformer (CGT) is a hybrid neural architecture that combines Graph Neural Networks (GNNs) and Transformers for domain-specific question answering.<n>CGT constructs a dynamic graph over input tokens using sequential, skip-gram, and semantic similarity edges.<n>It outperforms baselines like GPT-2 and BERT, achieving 24.7% higher accuracy than GPT-2 with 62.4% fewer parameters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Standard transformer-based language models, while powerful for general text, often struggle with the fine-grained syntax and entity relationships in complex technical, engineering documents. To address this, we propose the Contextual Graph Transformer (CGT), a hybrid neural architecture that combines Graph Neural Networks (GNNs) and Transformers for domain-specific question answering. CGT constructs a dynamic graph over input tokens using sequential, skip-gram, and semantic similarity edges, which is processed by GATv2Conv layers for local structure learning. These enriched embeddings are then passed to a Transformer encoder to capture global dependencies. Unlike generic large models, technical domains often require specialized language models with stronger contextualization and structure awareness. CGT offers a parameter-efficient solution for such use cases. Integrated into a Retrieval-Augmented Generation (RAG) pipeline, CGT outperforms baselines like GPT-2 and BERT, achieving 24.7% higher accuracy than GPT-2 with 62.4% fewer parameters. This gain stems from CGTs ability to jointly model structural token interactions and long-range semantic coherence. The model is trained from scratch using a two-phase approach: pretraining on general text followed by fine-tuning on domain-specific manuals. This highlights CGTs adaptability to technical language, enabling better grounding, entity tracking, and retrieval-augmented responses in real-world applications.
Related papers
- Plain Transformers Can be Powerful Graph Learners [64.50059165186701]
Researchers have attempted to migrate Transformers to graph learning, but most advanced Graph Transformers have strayed far from plain Transformers.<n>This work demonstrates that the plain Transformer architecture can be a powerful graph learner.
arXiv Detail & Related papers (2025-04-17T02:06:50Z) - Transducer Tuning: Efficient Model Adaptation for Software Tasks Using Code Property Graphs [8.26418657158164]
approach is a technique to adapt large models for downstream code tasks using Code Property Graphs (CPGs)<n>Our approach introduces a modular component called transducer that enriches code embeddings with structural and dependency information from CPGs.<n>Our results demonstrate competitive performance compared to full parameter fine-tuning while reducing up to 99% trainable parameters to save memory.
arXiv Detail & Related papers (2024-12-18T03:25:17Z) - GL-Fusion: Rethinking the Combination of Graph Neural Network and Large Language model [63.774726052837266]
We introduce a new architecture that deeply integrates Graph Neural Networks (GNNs) with Large Language Models (LLMs)<n>We introduce three key innovations: (1) Structure-Aware Transformers, which incorporate GNN's message-passing capabilities directly into LLM's transformer layers; (2) Graph-Text Cross-Attention, which processes full, uncompressed text from graph nodes and edges; and (3) GNN-LLM Twin Predictor, enabling LLM's flexible autoregressive generation alongside GNN's scalable one-pass prediction.
arXiv Detail & Related papers (2024-12-08T05:49:58Z) - Learning Graph Quantized Tokenizers [28.79505338383552]
Graph Transformers (GTs) have emerged as leading models in geometric deep learning.<n>We introduce GQT (textbfGraph textbfQuantized textbfTokenizer), which decouples tokenizer training from Transformer training.<n>By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 20 out of 22 benchmarks.
arXiv Detail & Related papers (2024-10-17T17:38:24Z) - How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)<n>We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models [42.46104516313823]
Dependency Transformer Grammars (DTGs) are a new class of Transformer language model with explicit dependency-based inductive bias.
DTGs simulate dependency transition systems with constrained attention patterns.
They achieve better generalization while maintaining comparable perplexity with Transformer language model baselines.
arXiv Detail & Related papers (2024-07-24T16:38:38Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - SIT3: Code Summarization with Structure-Induced Transformer [48.000063280183376]
We propose a novel model based on structure-induced self-attention, which encodes sequential inputs with highly-effective structure modeling.
Our newly-proposed model achieves new state-of-the-art results on popular benchmarks.
arXiv Detail & Related papers (2020-12-29T11:37:43Z) - E.T.: Entity-Transformers. Coreference augmented Neural Language Model
for richer mention representations via Entity-Transformer blocks [3.42658286826597]
We present an extension over the Transformer-block architecture used in neural language models, specifically in GPT2.
Our model, GPT2E, extends the Transformer layers architecture of GPT2 to Entity-Transformers, an architecture designed to handle coreference information when present.
arXiv Detail & Related papers (2020-11-10T22:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.