RTLLM: An Open-Source Benchmark for Design RTL Generation with Large
Language Model
- URL: http://arxiv.org/abs/2308.05345v3
- Date: Sat, 11 Nov 2023 07:53:10 GMT
- Title: RTLLM: An Open-Source Benchmark for Design RTL Generation with Large
Language Model
- Authors: Yao Lu, Shang Liu, Qijun Zhang, Zhiyao Xie
- Abstract summary: We propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions.
This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution.
We also propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning.
- Score: 6.722151433412209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the recent success of large language models (LLMs) like ChatGPT,
researchers start to explore the adoption of LLMs for agile hardware design,
such as generating design RTL based on natural-language instructions. However,
in existing works, their target designs are all relatively simple and in a
small scale, and proposed by the authors themselves, making a fair comparison
among different LLM solutions challenging. In addition, many prior works only
focus on the design correctness, without evaluating the design qualities of
generated design RTL. In this work, we propose an open-source benchmark named
RTLLM, for generating design RTL with natural language instructions. To
systematically evaluate the auto-generated design RTL, we summarized three
progressive goals, named syntax goal, functionality goal, and design quality
goal. This benchmark can automatically provide a quantitative evaluation of any
given LLM-based solution. Furthermore, we propose an easy-to-use yet
surprisingly effective prompt engineering technique named self-planning, which
proves to significantly boost the performance of GPT-3.5 in our proposed
benchmark.
Related papers
- One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects [0.02630859234884723]
Large Language Models (LLMs) have demonstrated potential in assisting with Register Transfer Level (RTL) design tasks.
There remains to be a significant gap in benchmarks that accurately reflect the complexity of real-world RTL projects.
This paper presents RTL-Repo, a benchmark designed to evaluate LLMs on large-scale RTL design projects.
arXiv Detail & Related papers (2024-05-27T17:36:01Z) - Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs [23.766782325052418]
This paper introduces Sample Design Engineering (SDE), a methodical approach to enhancing Large Language Models' post-tuning performance.
We conduct a series of in-domain (ID) and out-of-domain (OOD) experiments to assess the impact of various design options on LLMs' downstream performance.
We propose an integrated SDE strategy, combining the most effective options, and validate its consistent superiority over sample designs in complex downstream tasks.
arXiv Detail & Related papers (2024-04-19T17:47:02Z) - PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels.
We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings.
We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z) - An Embarrassingly Simple Approach for LLM with Strong ASR Capacity [56.30595787061546]
We focus on solving one of the most important tasks in the field of speech processing, with speech foundation encoders and large language models (LLM)
Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning for the LLM.
We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.
arXiv Detail & Related papers (2024-02-13T23:25:04Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon
Sequential Task Planning [7.701407633867452]
Large Language Models (LLMs) offer the potential to enhance the generalizability as task-agnostic planners.
We introduce ISR-LLM, a novel framework that improves LLM-based planning through an iterative self-refinement process.
We show that ISR-LLM is able to achieve markedly higher success rates in task accomplishments compared to state-of-the-art LLM-based planners.
arXiv Detail & Related papers (2023-08-26T01:31:35Z) - ChipGPT: How far are we from natural language hardware design [34.22592995908168]
This work attempts to demonstrate an automated design environment that explores LLMs to generate hardware logic designs from natural language specifications.
We present a scalable four-stage zero-code logic design framework based on LLMs without retraining or finetuning.
arXiv Detail & Related papers (2023-05-23T12:54:02Z) - Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM.
It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses.
We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z) - Guiding Large Language Models via Directional Stimulus Prompting [114.84930073977672]
We introduce Directional Stimulus Prompting, a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs.
Instead of directly adjusting LLMs, our method employs a small tunable policy model to generate an auxiliary directional stimulus prompt for each input instance.
arXiv Detail & Related papers (2023-02-22T17:44:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.