Related papers: Evaluating LLMs for Hardware Design and Test

Evaluating LLMs for Hardware Design and Test

URL: http://arxiv.org/abs/2405.02326v1
Date: Tue, 23 Apr 2024 18:55:49 GMT
Title: Evaluating LLMs for Hardware Design and Test
Authors: Jason Blocklove, Siddharth Garg, Ramesh Karri, Hammond Pearce,
Abstract summary: Large Language Models (LLMs) have demonstrated capabilities for producing code in Hardware Description Languages (HDLs) We examine the capabilities and limitations of the state-of-the-art conversational LLMs when producing Verilog for functional and verification purposes.
Score: 25.412044293834715
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) have demonstrated capabilities for producing code in Hardware Description Languages (HDLs). However, most of the focus remains on their abilities to write functional code, not test code. The hardware design process consists of both design and test, and so eschewing validation and verification leaves considerable potential benefit unexplored, given that a design and test framework may allow for progress towards full automation of the digital design pipeline. In this work, we perform one of the first studies exploring how a LLM can both design and test hardware modules from provided specifications. Using a suite of 8 representative benchmarks, we examined the capabilities and limitations of the state-of-the-art conversational LLMs when producing Verilog for functional and verification purposes. We taped out the benchmarks on a Skywater 130nm shuttle and received the functional chip.

Related papers

Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs [53.00384299879513]
In large language models (LLMs), code and reasoning reinforce each other. Code provides verifiable execution paths, enforces logical decomposition, and enables runtime validation. We identify key challenges and propose future research directions to strengthen this synergy.
arXiv Detail & Related papers (2025-02-26T18:55:42Z)
FVEval: Understanding Language Model Capabilities in Formal Verification of Digital Hardware [4.480157114854711]
We present FVEval, the first comprehensive benchmark for characterizing large language models (LLMs) performance in tasks pertaining to formal verification (FV) The benchmark consists of three sub-tasks that measure LLM capabilities at different levels. We present both collections of expert-written verification collateral and methodologies to scalably generate synthetic examples aligned with FV.
arXiv Detail & Related papers (2024-10-15T21:48:57Z)
Studying and Benchmarking Large Language Models For Log Level Suggestion [49.176736212364496]
Large Language Models (LLMs) have become a focal point of research across various domains. This paper investigates the impact of characteristics and learning paradigms on the performance of 12 open-source LLMs in log level suggestion.
arXiv Detail & Related papers (2024-10-11T03:52:17Z)
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation [48.11754113512047]
This study includes a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains. Our pipeline works in a fully automated manner, enabling a push-bottom construction from code repositories into formatted subjects under study. The contributions of this study include a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains, a fully automated pipeline for constructing code benchmarks, and an identification of the limitations of LLMs in code generation tasks based on their performance on DOMAINEVAL.
arXiv Detail & Related papers (2024-08-23T16:33:58Z)
VerilogReader: LLM-Aided Hardware Test Generation [5.012023213660125]
Large Language Model (LLM) with their advanced understanding and inference capabilities has introduced a novel approach. In this work, we investigate the integration of LLM into the Coverage Directed Test Generation (CDG) process. We compare our framework with random testing, using our self-designed Verilog benchmark suite.
arXiv Detail & Related papers (2024-06-03T07:20:51Z)
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge. It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages. We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z)
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering [74.99736967448423]
We construct Design2Code - the first real-world benchmark for this task. We manually curate 484 diverse real-world webpages as test cases and develop a set of automatic evaluation metrics. Our fine-grained break-down metrics indicate that models mostly lag in recalling visual elements from the input webpages and generating correct layout designs.
arXiv Detail & Related papers (2024-03-05T17:56:27Z)
UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing [27.45301385265713]
We present a large-scale dataset UniTSyn, which is capable of enhancing the prowess of LLMs for Unit Test Synthesis. By leveraging Language Server Protocol, UniSyn achieves the challenging goal of collecting focal-test pairs without per-project execution setups or per-language setups. Experiments demonstrate that, by building an autoregressive model based on UniTSyn, we can achieve significant benefits in learning and understanding unit test representations.
arXiv Detail & Related papers (2024-02-04T22:48:05Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation [74.7163199054881]
Large Language Models (LLMs) have demonstrated their capability in context understanding, logic reasoning and answer generation. We present a systematic study on the application of LLMs in the EDA field. We highlight the future research direction, focusing on applying LLMs in logic synthesis, physical design, multi-modal feature extraction and alignment of circuits.
arXiv Detail & Related papers (2023-12-28T15:09:14Z)
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation [4.9931630484957585]
Hardware design verification (DV) is a process that checks the functional equivalence of a hardware design against its specifications. A key task in the DV process is the test stimuli generation, which creates a set of conditions or inputs for testing. We propose an open-source benchmarking framework named LLM4DV that efficiently orchestrates LLMs for automated hardware test stimuli generation.
arXiv Detail & Related papers (2023-10-06T19:02:04Z)
ChipGPT: How far are we from natural language hardware design [34.22592995908168]
This work attempts to demonstrate an automated design environment that explores LLMs to generate hardware logic designs from natural language specifications. We present a scalable four-stage zero-code logic design framework based on LLMs without retraining or finetuning.
arXiv Detail & Related papers (2023-05-23T12:54:02Z)
Benchmarking Large Language Models for Automated Verilog RTL Code Generation [21.747037230069854]
We characterize the ability of large language models (LLMs) to generate useful Verilog. We construct an evaluation framework comprising test-benches for functional analysis and a flow to test the syntax of Verilog code. Our findings show that across our problem scenarios, the fine-tuning results in LLMs more capable of producing syntactically correct code.
arXiv Detail & Related papers (2022-12-13T16:34:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.