Related papers: A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation

A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation

URL: http://arxiv.org/abs/2304.13840v2
Date: Wed, 7 Jun 2023 09:33:04 GMT
Title: A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation
Authors: Enrique Dehaerne and Bappaditya Dey and Sandip Halder and Stefan De Gendt
Abstract summary: This paper proposes a novel deep learning framework for training a Verilog autocompletion model. The framework involves integrating models pretrained on general programming language data and finetuning them on a dataset curated to be similar to a target downstream task. Experiments demonstrate that the proposed framework achieves better BLEU, ROUGE-L, and chrF scores by 9.5%, 6.7%, and 6.9%, respectively, compared to a model trained from scratch.
Score: 0.33598755777055367
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Innovative Electronic Design Automation (EDA) solutions are important to meet the design requirements for increasingly complex electronic devices. Verilog, a hardware description language, is widely used for the design and verification of digital circuits and is synthesized using specific EDA tools. However, writing code is a repetitive and time-intensive task. This paper proposes, primarily, a novel deep learning framework for training a Verilog autocompletion model and, secondarily, a Verilog dataset of files and snippets obtained from open-source repositories. The framework involves integrating models pretrained on general programming language data and finetuning them on a dataset curated to be similar to a target downstream task. This is validated by comparing different pretrained models trained on different subsets of the proposed Verilog dataset using multiple evaluation metrics. These experiments demonstrate that the proposed framework achieves better BLEU, ROUGE-L, and chrF scores by 9.5%, 6.7%, and 6.9%, respectively, compared to a model trained from scratch. Code and data are made available at: https://github.com/99EnriqueD/verilog_autocompletion .

Related papers

AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models [86.83875864328984]
We propose an automated method for synthesizing open-ended logic puzzles, and use it to develop a bilingual benchmark, AutoLogi. Our approach features program-based verification and controllable difficulty levels, enabling more reliable evaluation that better distinguishes models' reasoning abilities.
arXiv Detail & Related papers (2025-02-24T07:02:31Z)
PyraNet: A Multi-Layered Hierarchical Dataset for Verilog [4.590930025882158]
We introduce a novel open-source dataset and a corresponding fine-tuning technique, which utilizes a multi-layered structure that we refer to as PyraNet. Experiments demonstrate that employing the proposed dataset and fine-tuning approach leads to a more accurate fine-tuned model, producing syntactically and functionally correct Verilog code.
arXiv Detail & Related papers (2024-12-09T19:45:54Z)
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair [4.554742043916029]
This paper first presents an analysis of fine-tuned LLMs on Verilog coding, with synthetic data from prior methods. We identify two main issues: difficulties in handling non-textual representations and significant variability during training with models randomly making "minor" mistakes. Our fine-tuned Starcoder2-15B outperforms prior state-of-the-art results by 3.8%, 10.9%, 6.6% for pass@1 on VerilogEval-Machine, VerilogEval-Human, and RTLLM.
arXiv Detail & Related papers (2024-09-19T12:15:55Z)
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework [50.02710905062184]
This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts. The accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark.
arXiv Detail & Related papers (2024-03-17T13:01:03Z)
An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources. We propose a data processing framework that integrates a Processing Module and an Analyzing Module. The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z)
BetterV: Controlled Verilog Generation with Discriminative Guidance [11.162807308782751]
We propose a Verilog generation framework, BetterV, which fine-tunes the large language models (LLMs) on processed domain-specific runtime. BetterV has the ability to generate syntactically and functionally correct Verilog, which can outperform GPT-4 on the VerilogEval benchmark.
arXiv Detail & Related papers (2024-02-03T08:00:12Z)
Generative AI for Software Metadata: Overview of the Information Retrieval in Software Engineering Track at FIRE 2023 [18.616716369775883]
The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments. The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source C based projects. The labels generated from large language models increase the bias in the prediction model but lead to less over-fitted results.
arXiv Detail & Related papers (2023-10-27T14:13:23Z)
Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR) It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z)
ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection. Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.