Related papers: SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction

SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction

URL: http://arxiv.org/abs/2511.07584v1
Date: Wed, 12 Nov 2025 01:05:36 GMT
Title: SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction
Authors: Wuyang Zhang, Chenkai Zhang, Zhen Luo, Jianming Ma, Wangming Yuan, Chuqiao Gu, Chenwei Feng,
Abstract summary: Large language models (LLMs) have transformed software development by enabling automated code generation, yet they frequently suffer from systematic errors that limit practical deployment.<n>We identify two critical failure modes: textitlogical hallucination (incorrect control/data-flow reasoning) and textitschematic hallucination (type mismatches, signature violations, and architectural inconsistencies).<n>This paper presents textbfSemanticForge, which introduces four fundamental algorithmic advances for semantically-aware code generation.
Score: 7.46733617565624
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have transformed software development by enabling automated code generation, yet they frequently suffer from systematic errors that limit practical deployment. We identify two critical failure modes: \textit{logical hallucination} (incorrect control/data-flow reasoning) and \textit{schematic hallucination} (type mismatches, signature violations, and architectural inconsistencies). These errors stem from the absence of explicit, queryable representations of repository-wide semantics. This paper presents \textbf{SemanticForge}, which introduces four fundamental algorithmic advances for semantically-aware code generation: (1) a novel automatic reconciliation algorithm for dual static-dynamic knowledge graphs, unifying compile-time and runtime program semantics; (2) a neural approach that learns to generate structured graph queries from natural language, achieving 73\% precision versus 51\% for traditional retrieval; (3) a novel beam search algorithm with integrated SMT solving, enabling real-time constraint verification during generation rather than post-hoc validation; and (4) an incremental maintenance algorithm that updates knowledge graphs in $O(|ΔR| \cdot \log n)$ time while maintaining semantic equivalence.

Related papers

AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms [54.99368693313797]
Existing benchmarks test only individual languages/tools, so the performance numbers are not directly comparable.<n>We address this gap with AlgoVeri, a benchmark that evaluates vericoding of $77$ classical algorithms in Dafny, Verus, and Lean.
arXiv Detail & Related papers (2026-02-10T06:58:26Z)
GRAD: Graph-Retrieved Adaptive Decoding for Hallucination Mitigation [12.69955054591315]
We introduce Graph-Retrieved Adaptive Decoding (GRAD), a decoding-time method that grounds generation in corpus-derived evidence without retraining.<n>Across three models and a range of question-answering benchmarks, GRAD consistently surpasses baselines.<n>GRAD offers a lightweight, plug-and-play alternative to contrastive decoding and knowledge graph augmentation.
arXiv Detail & Related papers (2025-11-05T22:51:16Z)
Truth-Aware Decoding: A Program-Logic Approach to Factual Language Generation [0.2864713389096699]
This paper introduces Truth-Aware Decoding (TAD), a verification-oriented decoding scheme that aligns neural language generation with knowledge bases.<n>Our contributions are fourfold: (i) a constraint-based semantics that renders oracle filtering as a program-logic judgment, (ii) a proof that greedy selection enjoys local likelihood dominance under sound and complete guards, and (iii) an entropy-style invariant that quantifies factual risk via knowledge-aware safe mass.
arXiv Detail & Related papers (2025-10-03T22:11:15Z)
SLICET5: Static Program Slicing using Language Models with Copy Mechanism and Constrained Decoding [13.61350801915956]
Static program slicing is a fundamental technique in software engineering.<n>ourtool is a novel slicing framework that reformulates static program slicing as a sequence-to-sequence task.<n>ourtool consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-22T03:14:47Z)
Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [79.75818239774952]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z)
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model [13.532046953850902]
We present DeepRTL, a unified representation model that excels in both Verilog understanding and generation.<n>Based on CodeT5+, DeepRTL is fine-tuned on a comprehensive dataset that aligns Verilog code with rich, multi-level natural language descriptions.<n>We introduce the first benchmark for Verilog understanding and take the initiative to apply embedding similarity and GPT Score to evaluate the models' understanding capabilities.
arXiv Detail & Related papers (2025-02-20T11:07:55Z)
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition [80.22784377150465]
Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding. This paper makes the first attempt to build a novel bottom-up Non-AutoRegressive Modeling approach for HMER, called NAMER. NAMER comprises a Visual Aware Tokenizer (VAT) and a Parallel Graph (PGD)
arXiv Detail & Related papers (2024-07-16T04:52:39Z)
Momentum Decoding: Open-ended Text Generation As Graph Exploration [49.812280360794894]
Open-ended text generation with autoregressive language models (LMs) is one of the core tasks in natural language processing. We formulate open-ended text generation from a new perspective, i.e., we view it as an exploration process within a directed graph. We propose a novel decoding method -- textitmomentum decoding -- which encourages the LM to explore new nodes outside the current graph.
arXiv Detail & Related papers (2022-12-05T11:16:47Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
Structure-Augmented Text Representation Learning for Efficient Knowledge Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks. These graphs are usually incomplete, urging auto-completion of them. graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings. textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.