A Deep Learning Framework for Verilog Autocompletion Towards Design and
Verification Automation
- URL: http://arxiv.org/abs/2304.13840v2
- Date: Wed, 7 Jun 2023 09:33:04 GMT
- Title: A Deep Learning Framework for Verilog Autocompletion Towards Design and
Verification Automation
- Authors: Enrique Dehaerne and Bappaditya Dey and Sandip Halder and Stefan De
Gendt
- Abstract summary: This paper proposes a novel deep learning framework for training a Verilog autocompletion model.
The framework involves integrating models pretrained on general programming language data and finetuning them on a dataset curated to be similar to a target downstream task.
Experiments demonstrate that the proposed framework achieves better BLEU, ROUGE-L, and chrF scores by 9.5%, 6.7%, and 6.9%, respectively, compared to a model trained from scratch.
- Score: 0.33598755777055367
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Innovative Electronic Design Automation (EDA) solutions are important to meet
the design requirements for increasingly complex electronic devices. Verilog, a
hardware description language, is widely used for the design and verification
of digital circuits and is synthesized using specific EDA tools. However,
writing code is a repetitive and time-intensive task. This paper proposes,
primarily, a novel deep learning framework for training a Verilog
autocompletion model and, secondarily, a Verilog dataset of files and snippets
obtained from open-source repositories. The framework involves integrating
models pretrained on general programming language data and finetuning them on a
dataset curated to be similar to a target downstream task. This is validated by
comparing different pretrained models trained on different subsets of the
proposed Verilog dataset using multiple evaluation metrics. These experiments
demonstrate that the proposed framework achieves better BLEU, ROUGE-L, and chrF
scores by 9.5%, 6.7%, and 6.9%, respectively, compared to a model trained from
scratch. Code and data are made available at:
https://github.com/99EnriqueD/verilog_autocompletion .
Related papers
- CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair [4.554742043916029]
This paper first presents an analysis of fine-tuned LLMs on Verilog coding, with synthetic data from prior methods.
We identify two main issues: difficulties in handling non-textual representations and significant variability during training with models randomly making "minor" mistakes.
Our fine-tuned Starcoder2-15B outperforms prior state-of-the-art results by 3.8%, 10.9%, 6.6% for pass@1 on VerilogEval-Machine, VerilogEval-Human, and RTLLM.
arXiv Detail & Related papers (2024-09-19T12:15:55Z) - Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework [50.02710905062184]
This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts.
The accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark.
arXiv Detail & Related papers (2024-03-17T13:01:03Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - BetterV: Controlled Verilog Generation with Discriminative Guidance [11.162807308782751]
We propose a Verilog generation framework, BetterV, which fine-tunes the large language models (LLMs) on processed domain-specific runtime.
BetterV has the ability to generate syntactically and functionally correct Verilog, which can outperform GPT-4 on the VerilogEval benchmark.
arXiv Detail & Related papers (2024-02-03T08:00:12Z) - Generative AI for Software Metadata: Overview of the Information
Retrieval in Software Engineering Track at FIRE 2023 [18.616716369775883]
The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments.
The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source C based projects.
The labels generated from large language models increase the bias in the prediction model but lead to less over-fitted results.
arXiv Detail & Related papers (2023-10-27T14:13:23Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.