A Machine Learning Approach Towards SKILL Code Autocompletion
- URL: http://arxiv.org/abs/2312.01921v2
- Date: Sat, 24 Feb 2024 13:58:26 GMT
- Title: A Machine Learning Approach Towards SKILL Code Autocompletion
- Authors: Enrique Dehaerne, Bappaditya Dey, Wannes Meert
- Abstract summary: This study is the first to apply transformers to SKILL code autocompletion towards improving the productivity of hardware design engineers.
We propose a novel methodology for creating a high-quality SKILL dataset with both unlabeled and labeled data.
We show that models trained using the proposed methodology outperform baselines in terms of human-judgment score and BLEU score.
- Score: 6.586356094533907
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As Moore's Law continues to increase the complexity of electronic systems,
Electronic Design Automation (EDA) must advance to meet global demand. An
important example of an EDA technology is SKILL, a scripting language used to
customize and extend EDA software. Recently, code generation models using the
transformer architecture have achieved impressive results in academic settings
and have even been used in commercial developer tools to improve developer
productivity. To the best of our knowledge, this study is the first to apply
transformers to SKILL code autocompletion towards improving the productivity of
hardware design engineers. In this study, a novel, data-efficient methodology
for generating SKILL code is proposed and experimentally validated. More
specifically, we propose a novel methodology for (i) creating a high-quality
SKILL dataset with both unlabeled and labeled data, (ii) a training strategy
where T5 models pre-trained on general programming language code are fine-tuned
on our custom SKILL dataset using unsupervised and supervised learning, and
(iii) evaluating synthesized SKILL code. We show that models trained using the
proposed methodology outperform baselines in terms of human-judgment score and
BLEU score. A major challenge faced was the extremely small amount of available
SKILL code data that can be used to train a transformer model to generate SKILL
code. Despite our validated improvements, the extremely small dataset available
to us was still not enough to train a model that can reliably autocomplete
SKILL code. We discuss this and other limitations as well as future work that
could address these limitations.
Related papers
- A Transformer-Based Approach for Smart Invocation of Automatic Code Completion [14.34818742116731]
We develop a machine learning model that can predict when to invoke a code completion tool.
We collect a dataset of 200k developer interactions with our cross-IDE code completion plugin.
Our results indicate that our small-scale transformer model significantly outperforms the baseline.
arXiv Detail & Related papers (2024-05-23T16:19:32Z) - Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models.
We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks.
We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z) - Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation [32.178931149612644]
ulModel ulImprovement via ulNeuron ulTargeting (textscMINT) is a novel approach for repairing code Language Models (LMs)
textscMINT is effective, efficient, and reliable, capable of correcting a neural model by patching a minimum number of neurons.
arXiv Detail & Related papers (2023-12-08T20:28:08Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation [9.477734501499274]
We present TransformCode, a novel framework that learns code embeddings in a contrastive learning manner.
Our framework is encoder-agnostic and language-agnostic, which means that it can leverage any encoder model and handle any programming language.
arXiv Detail & Related papers (2023-11-10T09:05:23Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z) - Assemble Foundation Models for Automatic Code Summarization [9.53949558569201]
We propose a flexible and robust approach for automatic code summarization based on neural networks.
We assemble available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo.
We introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning.
arXiv Detail & Related papers (2022-01-13T21:38:33Z) - Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and
Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness.
We combine the SE concept of code complexity with the AI technique of curriculum learning.
We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z) - PHOTONAI -- A Python API for Rapid Machine Learning Model Development [2.414341608751139]
PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development.
It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences.
arXiv Detail & Related papers (2020-02-13T10:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.