AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design
- URL: http://arxiv.org/abs/2407.03891v2
- Date: Tue, 20 Aug 2024 09:19:07 GMT
- Title: AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design
- Authors: Ruidi Qiu, Grace Li Zhang, Rolf Drechsler, Ulf Schlichtmann, Bing Li,
- Abstract summary: Testbenches constitute the cornerstone of simulation-based hardware verification.
Large Language Models (LLMs) have demonstrated their potential in automating the circuit design flow.
We introduce AutoBench, the first LLM-based testbench generator for digital circuit design.
- Score: 6.414167153186868
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In digital circuit design, testbenches constitute the cornerstone of simulation-based hardware verification. Traditional methodologies for testbench generation during simulation-based hardware verification still remain partially manual, resulting in inefficiencies in testing various scenarios and requiring expensive time from designers. Large Language Models (LLMs) have demonstrated their potential in automating the circuit design flow. However, directly applying LLMs to generate testbenches suffers from a low pass rate. To address this challenge, we introduce AutoBench, the first LLM-based testbench generator for digital circuit design, which requires only the description of the design under test (DUT) to automatically generate comprehensive testbenches. In AutoBench, a hybrid testbench structure and a self-checking system are realized using LLMs. To validate the generated testbenches, we also introduce an automated testbench evaluation framework to evaluate the quality of generated testbenches from multiple perspectives. Experimental results demonstrate that AutoBench achieves a 57% improvement in the testbench pass@1 ratio compared with the baseline that directly generates testbenches using LLMs. For 75 sequential circuits, AutoBench successfully has a 3.36 times testbench pass@1 ratio compared with the baseline. The source codes and experimental results are open-sourced at this link: https://github.com/AutoBench/AutoBench
Related papers
- CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design [6.414167153186868]
We propose CorrectBench, an automatic testbench generation framework with functional self-validation and self-correction.
The proposed approach can validate the correctness of the generated testbenches with a success rate of 88.85%.
Our work's performance is 62.18% higher than previous work in sequential tasks and almost 5 times the pass ratio of the direct method.
arXiv Detail & Related papers (2024-11-13T10:45:19Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration [7.261063083251448]
We present a complete framework for calibrating and administering a robust large-scale computerized adaptive test (CAT) with a small number of responses.
We use AutoIRT, a new method that uses automated machine learning (AutoML) in combination with item response theory (IRT)
We propose the BanditCAT framework, a methodology motivated by casting the problem in the contextual bandit framework and utilizing item response theory (IRT)
arXiv Detail & Related papers (2024-10-28T13:54:10Z) - A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites [1.4563527353943984]
Large Language Models (LLMs) have been applied to various aspects of software development.
We present AgoneTest: an automated system for generating test suites for Java projects.
arXiv Detail & Related papers (2024-08-14T23:02:16Z) - LiveBench: A Challenging, Contamination-Free LLM Benchmark [101.21578097087699]
We release LiveBench, the first benchmark that contains frequently-updated questions from recent information sources.
We evaluate many prominent closed-source models, as well as dozens of open-source models ranging from 0.5B to 110B in size.
Questions will be added and updated on a monthly basis, and we will release new tasks and harder versions of tasks over time.
arXiv Detail & Related papers (2024-06-27T16:47:42Z) - Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models [54.14602121129874]
We introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data.
AutoIF transforms the validation of instruction-following data quality into code verification.
arXiv Detail & Related papers (2024-06-19T13:29:53Z) - Automatic benchmarking of large multimodal models via iterative experiment programming [71.78089106671581]
We present APEx, the first framework for automatic benchmarking of LMMs.
Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand.
The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions.
arXiv Detail & Related papers (2024-06-18T06:43:46Z) - LLM-Powered Test Case Generation for Detecting Tricky Bugs [30.82169191775785]
AID generates test inputs and oracles targeting plausibly correct programs.
We evaluate AID on two large-scale datasets with tricky bugs: TrickyBugs and EvalPlus.
The evaluation results show that the recall, precision, and F1 score of AID outperform the state-of-the-art by up to 1.80x, 2.65x, and 1.66x, respectively.
arXiv Detail & Related papers (2024-04-16T06:20:06Z) - Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers [121.53749383203792]
We present a holistic end-to-end solution for annotating the factuality of large language models (LLMs)-generated responses.
We construct an open-domain document-level factuality benchmark in three-level granularity: claim, sentence and document.
Preliminary experiments show that FacTool, FactScore and Perplexity are struggling to identify false claims.
arXiv Detail & Related papers (2023-11-15T14:41:57Z) - Test-Time Training with Masked Autoencoders [54.983147122777574]
Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision.
In this paper, we use masked autoencoders for this one-sample learning problem.
arXiv Detail & Related papers (2022-09-15T17:59:34Z) - SilGAN: Generating driving maneuvers for scenario-based
software-in-the-loop testing [0.0]
SilGAN is a deep generative model that eases specification, stimulus generation, and automation of automotive software-in-the-loop testing.
The model is trained using data recorded from vehicles in the field.
arXiv Detail & Related papers (2021-07-05T07:17:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.