VeriGRAG: Enhancing LLM-Based Verilog Code Generation with Structure-Aware Soft Prompts
- URL: http://arxiv.org/abs/2510.15914v1
- Date: Sat, 27 Sep 2025 10:23:36 GMT
- Title: VeriGRAG: Enhancing LLM-Based Verilog Code Generation with Structure-Aware Soft Prompts
- Authors: Jiayu Zhao, Song Chen,
- Abstract summary: We propose a novel framework that extracts structural graph embeddings from Verilog code using graph neural networks (GNNs)<n>A multimodal retriever then selects the graph embeddings most relevant to the given generation task.<n>Experiments demonstrate that VeriGRAG substantially improves the correctness of Verilog code generation.
- Score: 4.227182480042518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have demonstrated strong capabilities in generating Verilog code from natural language descriptions. However, Verilog code inherently encodes structural information of hardware circuits. Effectively leveraging this structural information to enhance the functional and syntactic correctness of LLM-generated Verilog code remains a significant challenge. To address this challenge, we propose VeriGRAG , a novel framework that extracts structural graph embeddings from Verilog code using graph neural networks (GNNs). A multimodal retriever then selects the graph embeddings most relevant to the given generation task, which are aligned with the code modality through the VeriFormer module to generate structure-aware soft prompts. Our experiments demonstrate that VeriGRAG substantially improves the correctness of Verilog code generation, achieving state-of-the-art or superior performance across both VerilogEval and RTLLM benchmarks.
Related papers
- QiMeng-CRUX: Narrowing the Gap between Natural Language and Verilog via Core Refined Understanding eXpression [48.84841760215598]
Large language models (LLMs) have shown promising capabilities in hardware description language (HDL) generation.<n>Existing approaches often rely on free-form natural language descriptions that are often ambiguous, redundant, and unstructured.<n>We treat hardware code generation as a complex transformation from an open-ended natural language space to a domain-specific, highly constrained target space.<n>We introduce Core Refined Understanding eXpression (CRUX), a structured intermediate space that captures the essential semantics of user intent while organizing the expression for precise Verilog code generation.
arXiv Detail & Related papers (2025-11-25T09:17:32Z) - QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation [47.82802346420197]
We propose Signal-Aware Learning for Verilog code generation (QiMeng-SALV)<n>We verify the functional correctness of signals in generated module by comparing with that of reference module in the training data.<n>Finally, we introduce signal-aware DPO which is optimized on the correct signal-level code segments.
arXiv Detail & Related papers (2025-10-22T06:58:07Z) - CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation [69.684886175768]
Large language models (LLMs) have shown promising performance in automated code generation.<n>In this paper, we propose CodeRAG, a retrieval-augmented code generation framework.<n> Experiments show that CodeRAG achieves significant improvements compared to no RAG scenarios.
arXiv Detail & Related papers (2025-04-14T09:51:23Z) - Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs [53.00384299879513]
In large language models (LLMs), code and reasoning reinforce each other.<n>Code provides verifiable execution paths, enforces logical decomposition, and enables runtime validation.<n>We identify key challenges and propose future research directions to strengthen this synergy.
arXiv Detail & Related papers (2025-02-26T18:55:42Z) - DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model [13.532046953850902]
We present DeepRTL, a unified representation model that excels in both Verilog understanding and generation.<n>Based on CodeT5+, DeepRTL is fine-tuned on a comprehensive dataset that aligns Verilog code with rich, multi-level natural language descriptions.<n>We introduce the first benchmark for Verilog understanding and take the initiative to apply embedding similarity and GPT Score to evaluate the models' understanding capabilities.
arXiv Detail & Related papers (2025-02-20T11:07:55Z) - HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design [24.46771930751068]
HiVeGen is a hierarchical Verilog generation framework that decomposes generation tasks into hierarchical submodules.<n> automatic Design Space Exploration (DSE) into hierarchy-aware prompt generation, introducing weight-based retrieval to enhance code reuse.<n>Real-time human-computer interaction to lower error-correction cost, significantly improving the quality of generated designs.
arXiv Detail & Related papers (2024-12-06T19:37:53Z) - Large Language Model for Verilog Generation with Code-Structure-Guided Reinforcement Learning [29.135207235743795]
This paper introduces VeriSeek, an LLM enhanced by reinforcement learning to achieve high Verilog code generation performance.<n>Our reinforcement learning approach employs code structure information as feedback signals to refine the pre-trained model.<n>Experiments show that VeriSeek outperforms state-of-the-art methods across multiple benchmarks.
arXiv Detail & Related papers (2024-07-21T11:25:21Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to better interpret the programming domain knowledge.<n>CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - A Multi-Expert Large Language Model Architecture for Verilog Code Generation [5.159745269633967]
This paper introduces an innovative multi-expert LLM architecture for Verilog code generation (MEV-LLM)
Our architecture uniquely integrates multiple LLMs, each specifically fine-tuned with a dataset that is categorized with respect to a distinct level of design complexity.
Empirical evidence from experiments highlights notable improvements in terms of the percentage of generated Verilog outputs that are syntactically and functionally correct.
arXiv Detail & Related papers (2024-04-11T16:58:29Z) - Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework [50.02710905062184]
This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts.
The accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark.
arXiv Detail & Related papers (2024-03-17T13:01:03Z) - VerilogEval: Evaluating Large Language Models for Verilog Code
Generation [6.88526119890374]
We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits.
The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines.
arXiv Detail & Related papers (2023-09-14T09:15:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.