VeriOpt: PPA-Aware High-Quality Verilog Generation via Multi-Role LLMs
- URL: http://arxiv.org/abs/2507.14776v1
- Date: Sun, 20 Jul 2025 00:28:55 GMT
- Title: VeriOpt: PPA-Aware High-Quality Verilog Generation via Multi-Role LLMs
- Authors: Kimia Tasnia, Alexander Garcia, Tasnuva Farheen, Sazadur Rahman,
- Abstract summary: VeriOpt is a novel framework that leverages role-based prompting and PPA-aware optimization to produce high-quality, synthesizable Verilog.<n>Our work advances the state-of-the-art AI-driven hardware design by addressing the critical gap between correctness and quality.
- Score: 41.94295877935867
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The rapid adoption of large language models(LLMs) in hardware design has primarily focused on generating functionally correct Verilog code, overlooking critical Power Performance-Area(PPA) metrics essential for industrial-grade designs. To bridge this gap, we propose VeriOpt, a novel framework that leverages role-based prompting and PPA-aware optimization to enable LLMs to produce high-quality, synthesizable Verilog. VeriOpt structures LLM interactions into specialized roles (e.g., Planner, Programmer, Reviewer, Evaluator) to emulate human design workflows, while integrating PPA constraints directly into the prompting pipeline. By combining multi-modal feedback (e.g., synthesis reports, timing diagrams) with PPA aware prompting, VeriOpt achieves PPA-efficient code generation without sacrificing functional correctness. Experimental results demonstrate up to 88% reduction in power, 76% reduction in area and 73% improvement in timing closure compared to baseline LLM-generated RTL, validated using industry standard EDA tools. At the same time achieves 86% success rate in functionality evaluation. Our work advances the state-of-the-art AI-driven hardware design by addressing the critical gap between correctness and quality, paving the way for reliable LLM adoption in production workflows.
Related papers
- From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs [48.83701310501069]
Large language models (LLMs) have achieved notable performance in code synthesis.<n>We introduce a performance-aware, closed-loop solution that enables LLMs to autonomously engineer optimal transformations.<n>We fine-tune LLMs with Low-Rank Adaptation on a novel repository of more than 6,000 empirically evaluated PyTorch augmentation functions.
arXiv Detail & Related papers (2026-01-07T11:13:02Z) - A New Benchmark for the Appropriate Evaluation of RTL Code Optimization [11.115027718178759]
This work introduces RTL-OPT, a benchmark for assessing the capability of large language models (LLMs) in RTL optimization.<n>Each task provides a pair of RTL codes, a suboptimal version and a human-optimized reference that reflects industry-proven optimization patterns.<n>Furthermore, RTL-OPT integrates an automated evaluation framework to verify functional correctness and quantify improvements.
arXiv Detail & Related papers (2026-01-05T03:47:26Z) - LLM-VeriPPA: Power, Performance, and Area Optimization aware Verilog Code Generation with Large Language Models [15.396388099390185]
This paper delves into the field of chip design using Large Language Models (LLMs)<n>We introduce a novel framework VeriPPA designed to optimize PPA and generate Verilog code using LLMs.<n>Our framework achieves an 81.37% success rate in syntactic correctness and 62.06% in functional correctness for code genera- tion, outperforming current state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2025-09-10T22:49:50Z) - ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning [32.11086992218369]
ChipSeek-R1 is a hierarchical reward-driven reinforcement learning framework for large language models.<n>It generates RTL code that is both functional correctness and PPA-optimized.<n>On the RTLLM benchmark, ChipSeek-R1 generated 27 RTL designs surpassing the PPA metrics of the original human-written code.
arXiv Detail & Related papers (2025-07-07T08:08:20Z) - PRO-V: An Efficient Program Generation Multi-Agent System for Automatic RTL Verification [6.983135183126461]
Pro-V is a fully program generation multi-agent system for robust RTL verification.<n>It incorporates an efficient best-of-n iterative sampling strategy to enhance the correctness of generated testbenches.<n>Pro-V attains a verification accuracy of 87.17% on golden RTL implementations and 76.28% on RTL mutants.
arXiv Detail & Related papers (2025-06-13T20:06:34Z) - Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning [95.44766931218896]
Multi-modal large language models (MLLMs) still lag behind text-based reasoning.<n>We introduce Perception-Reasoning Decoupling, which modularizes the MLLM's reasoning component and makes it easily replaceable.<n>We propose a novel reinforcement learning algorithm called Visual Perception Optimization (VPO) to align the MLLM's perceptual output with the final reasoning task.
arXiv Detail & Related papers (2025-06-05T02:28:07Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - TuRTLe: A Unified Evaluation of LLMs for RTL Generation [0.6010802600885173]
We propose TuRTLe, a unified evaluation framework designed to assess LLMs across key RTL generation tasks.<n>We benchmark a diverse set of open LLMs and analyze their strengths and weaknesses in EDA-specific tasks.<n>Our results show that reasoning-based models, such as DeepSeek R1, consistently outperform others across multiple evaluation criteria.
arXiv Detail & Related papers (2025-03-31T07:43:12Z) - LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.<n>We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.<n>LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z) - EDA-Aware RTL Generation with Large Language Models [0.7831852829409273]
Large Language Models (LLMs) have become increasingly popular for generating RTL code.<n> producing error-free RTL code in a zero-shot setting remains highly challenging for even state-of-the-art LLMs.<n>We introduce AIvril2, a self-verifying, LLM-agnostic agentic framework aimed at enhancing RTL code generation through iterative corrections of both syntax and functional errors.
arXiv Detail & Related papers (2024-11-21T00:37:51Z) - PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks.<n>We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z) - VinePPO: Refining Credit Assignment in RL Training of LLMs [66.80143024475635]
We propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates.<n>Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time.
arXiv Detail & Related papers (2024-10-02T15:49:30Z) - AIvril: AI-Driven RTL Generation With Verification In-The-Loop [0.7831852829409273]
Large Language Models (LLMs) are computational models capable of performing complex natural language processing tasks.
This paper introduces AIvril, a framework designed to enhance the accuracy and reliability of RTL-aware LLMs.
arXiv Detail & Related papers (2024-09-03T15:07:11Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents [50.82665351100067]
FlowGen is a code generation framework that emulates software process models based on multiple Large Language Model (LLM) agents.
We evaluate FlowGenScrum on four benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET.
arXiv Detail & Related papers (2024-03-23T14:04:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.