VerilogEval: Evaluating Large Language Models for Verilog Code
Generation
- URL: http://arxiv.org/abs/2309.07544v2
- Date: Sun, 10 Dec 2023 00:23:04 GMT
- Title: VerilogEval: Evaluating Large Language Models for Verilog Code
Generation
- Authors: Mingjie Liu, Nathaniel Pinckney, Brucek Khailany and Haoxing Ren
- Abstract summary: We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits.
The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines.
- Score: 6.88526119890374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing popularity of large language models (LLMs) has paved the way
for their application in diverse domains. This paper proposes a benchmarking
framework tailored specifically for evaluating LLM performance in the context
of Verilog code generation for hardware design and verification. We present a
comprehensive evaluation dataset consisting of 156 problems from the Verilog
instructional website HDLBits. The evaluation set consists of a diverse set of
Verilog code generation tasks, ranging from simple combinational circuits to
complex finite state machines. The Verilog code completions can be
automatically tested for functional correctness by comparing the transient
simulation outputs of the generated design with a golden solution. We also
demonstrate that the Verilog code generation capability of pretrained language
models could be improved with supervised fine-tuning by bootstrapping with LLM
generated synthetic problem-code pairs.
Related papers
- COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool [4.027984601764008]
We propose VerilogCoder, a system of multiple Artificial Intelligence (AI) agents for Verilog code generation.
The proposed methodology successfully generates 94.2% syntactically and functionally correct Verilog code.
arXiv Detail & Related papers (2024-08-15T20:06:06Z) - Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models [54.51932175059004]
We introduce a scalable method for generating synthetic instructions to enhance the code generation capability of Large Language Models.
The proposed algorithm, Genetic-Instruct, mimics evolutionary processes, utilizing self-instruction to create numerous synthetic samples from a limited number of seeds.
arXiv Detail & Related papers (2024-07-29T20:42:59Z) - A Multi-Expert Large Language Model Architecture for Verilog Code Generation [5.159745269633967]
This paper introduces an innovative multi-expert LLM architecture for Verilog code generation (MEV-LLM)
Our architecture uniquely integrates multiple LLMs, each specifically fine-tuned with a dataset that is categorized with respect to a distinct level of design complexity.
Empirical evidence from experiments highlights notable improvements in terms of the percentage of generated Verilog outputs that are syntactically and functionally correct.
arXiv Detail & Related papers (2024-04-11T16:58:29Z) - Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework [50.02710905062184]
This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts.
The accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark.
arXiv Detail & Related papers (2024-03-17T13:01:03Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Exploring Large Language Models for Code Explanation [3.2570216147409514]
Large Language Models (LLMs) have made remarkable strides in Natural Language Processing.
This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs.
arXiv Detail & Related papers (2023-10-25T14:38:40Z) - VeriGen: A Large Language Model for Verilog Code Generation [22.837558083876743]
We fine-tune pre-existing Large Language Models (LLMs) on Verilog datasets compiled from GitHub and Verilog textbooks.
Here, our fine-tuned open-source CodeGen-16B model outperforms the commercial state-of-the-art GPT-3.5-turbo model with a 1.1% overall increase.
Notably, it demonstrates a 41% improvement in generating syntactically correct Verilog code across various problem categories compared to its pre-trained counterpart.
arXiv Detail & Related papers (2023-07-28T02:57:14Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - Benchmarking Large Language Models for Automated Verilog RTL Code
Generation [21.747037230069854]
We characterize the ability of large language models (LLMs) to generate useful Verilog.
We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code.
Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code.
arXiv Detail & Related papers (2022-12-13T16:34:39Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.