Related papers: A New Benchmark for the Appropriate Evaluation of RTL Code Optimization

A New Benchmark for the Appropriate Evaluation of RTL Code Optimization

URL: http://arxiv.org/abs/2601.01765v1
Date: Mon, 05 Jan 2026 03:47:26 GMT
Title: A New Benchmark for the Appropriate Evaluation of RTL Code Optimization
Authors: Yao Lu, Shang Liu, Hangan Zhou, Wenji Fang, Qijun Zhang, Zhiyao Xie,
Abstract summary: This work introduces RTL-OPT, a benchmark for assessing the capability of large language models (LLMs) in RTL optimization.<n>Each task provides a pair of RTL codes, a suboptimal version and a human-optimized reference that reflects industry-proven optimization patterns.<n>Furthermore, RTL-OPT integrates an automated evaluation framework to verify functional correctness and quantify improvements.
Score: 11.115027718178759
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The rapid progress of artificial intelligence increasingly relies on efficient integrated circuit (IC) design. Recent studies have explored the use of large language models (LLMs) for generating Register Transfer Level (RTL) code, but existing benchmarks mainly evaluate syntactic correctness rather than optimization quality in terms of power, performance, and area (PPA). This work introduces RTL-OPT, a benchmark for assessing the capability of LLMs in RTL optimization. RTL-OPT contains 36 handcrafted digital designs that cover diverse implementation categories including combinational logic, pipelined datapaths, finite state machines, and memory interfaces. Each task provides a pair of RTL codes, a suboptimal version and a human-optimized reference that reflects industry-proven optimization patterns not captured by conventional synthesis tools. Furthermore, RTL-OPT integrates an automated evaluation framework to verify functional correctness and quantify PPA improvements, enabling standardized and meaningful assessment of generative models for hardware design optimization.

Related papers

TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization [97.18886232580131]
Large language models have demonstrated strong reasoning capabilities in complex tasks through tool integration.<n>We propose Turn-Level GRPO, a lightweight RL algorithm that performs turn-level group sampling for fine-grained optimization.
arXiv Detail & Related papers (2026-01-23T06:21:33Z)
From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs [48.83701310501069]
Large language models (LLMs) have achieved notable performance in code synthesis.<n>We introduce a performance-aware, closed-loop solution that enables LLMs to autonomously engineer optimal transformations.<n>We fine-tune LLMs with Low-Rank Adaptation on a novel repository of more than 6,000 empirically evaluated PyTorch augmentation functions.
arXiv Detail & Related papers (2026-01-07T11:13:02Z)
Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z)
Rectifying LLM Thought from Lens of Optimization [48.98086817378953]
Long chain-of-thought (CoT) prompting enables thorough exploration and deliberation.<n>Despite advances, long-CoT LLMs often exhibit suboptimal reasoning behaviors.<n>We introduce RePro, a novel approach to refine LLM reasoning during post-training.
arXiv Detail & Related papers (2025-12-01T17:41:08Z)
ChipSeek-R1: Generating Human-Surpassing RTL with LLM via Hierarchical Reward-Driven Reinforcement Learning [32.11086992218369]
ChipSeek-R1 is a hierarchical reward-driven reinforcement learning framework for large language models.<n>It generates RTL code that is both functional correctness and PPA-optimized.<n>On the RTLLM benchmark, ChipSeek-R1 generated 27 RTL designs surpassing the PPA metrics of the original human-written code.
arXiv Detail & Related papers (2025-07-07T08:08:20Z)
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning [30.938876549335067]
This paper presents SymRTLO, a novel neuron-symbolic RTL optimization framework.<n>A symbolic module is proposed for analyzing and optimizing finite state machine (FSM) logic.<n> Experiments on the RTL-Rewriter benchmark with Synopsys Design Compiler and Yosys show that SymRTLO improves power, performance, and area (PPA) by up to 43.9%, 62.5%, and 51.1%, respectively.
arXiv Detail & Related papers (2025-04-14T16:15:55Z)
TuRTLe: A Unified Evaluation of LLMs for RTL Generation [0.6010802600885173]
We propose TuRTLe, a unified evaluation framework designed to assess LLMs across key RTL generation tasks.<n>We benchmark a diverse set of open LLMs and analyze their strengths and weaknesses in EDA-specific tasks.<n>Our results show that reasoning-based models, such as DeepSeek R1, consistently outperform others across multiple evaluation criteria.
arXiv Detail & Related papers (2025-03-31T07:43:12Z)
Machine Learning Framework for Early Power, Performance, and Area Estimation of RTL [0.0]
This paper proposes a pre-synthesis framework that makes early estimation of power, performance and area (PPA) metrics directly from the hardware description language (HDL) code.<n>The proposed model bridges the RTL and post-synthesis design, which will help in precisely predicting key metrics.
arXiv Detail & Related papers (2025-02-22T12:12:51Z)
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning [59.25951947621526]
We propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers.<n>We release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs.<n>Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.
arXiv Detail & Related papers (2025-02-19T15:32:11Z)
Scaffolded Language Models with Language Supervision for Mixed-Autonomy: A Survey [52.00674453604779]
This survey organizes the literature on the design and optimization of emerging structures around post-trained LMs.<n>We refer to this overarching structure as scaffolded LMs and focus on LMs that are integrated into multi-step processes with tools.
arXiv Detail & Related papers (2024-10-21T18:06:25Z)
In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)
RTLRewriter: Methodologies for Large Models aided RTL Code Optimization [21.61206887869307]
This paper introduces RTLRewriter, an innovative framework that leverages large models to optimize RTL code. A circuit partition pipeline is utilized for fast synthesis and efficient rewriting. A specialized search engine is designed to identify useful optimization guides, algorithms, and code snippets.
arXiv Detail & Related papers (2024-09-04T09:59:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.