QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation
- URL: http://arxiv.org/abs/2510.19296v2
- Date: Tue, 04 Nov 2025 08:39:14 GMT
- Title: QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation
- Authors: Yang Zhang, Rui Zhang, Jiaming Guo, Lei Huang, Di Huang, Yunpu Zhao, Shuyao Cheng, Pengwei Jin, Chongxiao Li, Zidong Du, Xing Hu, Qi Guo, Yunji Chen,
- Abstract summary: We propose Signal-Aware Learning for Verilog code generation (QiMeng-SALV)<n>We verify the functional correctness of signals in generated module by comparing with that of reference module in the training data.<n>Finally, we introduce signal-aware DPO which is optimized on the correct signal-level code segments.
- Score: 47.82802346420197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The remarkable progress of Large Language Models (LLMs) presents promising opportunities for Verilog code generation which is significantly important for automated circuit design. The lacking of meaningful functional rewards hinders the preference optimization based on Reinforcement Learning (RL) for producing functionally correct Verilog code. In this paper, we propose Signal-Aware Learning for Verilog code generation (QiMeng-SALV) by leveraging code segments of functionally correct output signal to optimize RL training. Considering Verilog code specifies the structural interconnection of hardware gates and wires so that different output signals are independent, the key insight of QiMeng-SALV is to extract verified signal-aware implementations in partially incorrect modules, so as to enhance the extraction of meaningful functional rewards. Roughly, we verify the functional correctness of signals in generated module by comparing with that of reference module in the training data. Then abstract syntax tree (AST) is employed to identify signal-aware code segments which can provide meaningful functional rewards from erroneous modules. Finally, we introduce signal-aware DPO which is optimized on the correct signal-level code segments, thereby preventing noise and interference from incorrect signals. The proposed QiMeng-SALV underscores the paradigm shift from conventional module-level to fine-grained signal-level optimization in Verilog code generation, addressing the issue of insufficient functional rewards. Experiments demonstrate that our method achieves state-of-the-art performance on VerilogEval and RTLLM, with a 7B parameter model matching the performance of the DeepSeek v3 671B model and significantly outperforming the leading open-source model CodeV trained on the same dataset. Our code is available at https://github.com/zy1xxx/SALV.
Related papers
- CodeCircuit: Toward Inferring LLM-Generated Code Correctness via Attribution Graphs [13.488544043942495]
We aim to investigate whether the model's neural dynamics encode internally decodable signals that are predictive of logical validity during code generation.<n>By decomposing complex residual flows, we aim to identify the structural signatures that distinguish sound reasoning from logical failure.<n>Analysis across Python, C++, and Java confirms that intrinsic correctness signals are robust across diverse syntaxes.
arXiv Detail & Related papers (2026-02-06T03:49:15Z) - Aletheia: What Makes RLVR For Code Verifiers Tick? [51.371034079170435]
Verifiers trained via Reinforcement Learning from Verifiable Rewards (RLVR) are a prominent fixture of the Large Language Model (LLM) post-training pipeline.<n>Code verifiers remain valuable toward judging model outputs in scenarios where execution feedback is hard to obtain.<n>We examine components of the RLVR-based verifier training recipe widely credited for its success.
arXiv Detail & Related papers (2026-01-17T22:30:45Z) - EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation [7.512194032034432]
Reinforcement Learning with Verifiable Rewards (RLVR) offers a promising approach to bridge the gap between model capability and real-world RTL design.<n>We present EARL, an Entropy-Aware Reinforcement Learning framework for Verilog generation.
arXiv Detail & Related papers (2025-11-15T05:00:07Z) - CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment [98.87395842351627]
Large Language Models (LLMs) excel at code generation by learning from vast code corpora.<n>A fundamental semantic gap remains between their training on textual patterns and the goal of functional correctness.<n>We propose CodeRL+, a novel approach that integrates execution semantics alignment into the RLVR training pipeline for code generation.
arXiv Detail & Related papers (2025-10-21T09:48:06Z) - VERIRL: Boosting the LLM-based Verilog Code Generation via Reinforcement Learning [32.974199255760944]
We introduce a reinforcement learning framework tailored for Verilog code generation.<n>To tackle the problem of sparse and noisy reward signals, we propose a Trace-back based Rescore mechanism.<n>To mitigate catastrophic forgetting and overfitting during RL fine-tuning, we introduce a sample-balanced weighting strategy.
arXiv Detail & Related papers (2025-08-25T20:20:44Z) - DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation [68.19756761027351]
Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models.<n>We investigate their denoising processes and reinforcement learning methods.<n>Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework.
arXiv Detail & Related papers (2025-06-25T17:35:47Z) - QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation [51.393569044134445]
Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification.<n> Extending RLVR to automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges.<n>We introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs.
arXiv Detail & Related papers (2025-05-30T03:51:06Z) - Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback [36.69082579950107]
Large language models (LLMs) have shown strong performance in Verilog generation from natural language description.<n>This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs.
arXiv Detail & Related papers (2025-04-22T11:38:14Z) - Factor Graph Optimization of Error-Correcting Codes for Belief Propagation Decoding [62.25533750469467]
Low-Density Parity-Check (LDPC) codes possess several advantages over other families of codes.
The proposed approach is shown to outperform the decoding performance of existing popular codes by orders of magnitude.
arXiv Detail & Related papers (2024-06-09T12:08:56Z) - BetterV: Controlled Verilog Generation with Discriminative Guidance [11.162807308782751]
We propose a Verilog generation framework, BetterV, which fine-tunes the large language models (LLMs) on processed domain-specific runtime.
BetterV has the ability to generate syntactically and functionally correct Verilog, which can outperform GPT-4 on the VerilogEval benchmark.
arXiv Detail & Related papers (2024-02-03T08:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.