Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models
- URL: http://arxiv.org/abs/2403.13588v1
- Date: Wed, 20 Mar 2024 13:37:00 GMT
- Title: Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models
- Authors: Chengzhe Feng, Yanan Sun, Ke Li, Pan Zhou, Jiancheng Lv, Aojun Lu,
- Abstract summary: We investigate the effectiveness of prompt learning in code intelligence tasks.
Existing automatic prompt design methods are very limited to code intelligence tasks.
We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
- Score: 54.58108387797138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Pre-trained Language Models (PLMs), a popular approach for code intelligence, continue to grow in size, the computational cost of their usage has become prohibitively expensive. Prompt learning, a recent development in the field of natural language processing, emerges as a potential solution to address this challenge. In this paper, we investigate the effectiveness of prompt learning in code intelligence tasks. We unveil its reliance on manually designed prompts, which often require significant human effort and expertise. Moreover, we discover existing automatic prompt design methods are very limited to code intelligence tasks due to factors including gradient dependence, high computational demands, and limited applicability. To effectively address both issues, we propose Genetic Auto Prompt (GenAP), which utilizes an elaborate genetic algorithm to automatically design prompts. With GenAP, non-experts can effortlessly generate superior prompts compared to meticulously manual-designed ones. GenAP operates without the need for gradients or additional computational costs, rendering it gradient-free and cost-effective. Moreover, GenAP supports both understanding and generation types of code intelligence tasks, exhibiting great applicability. We conduct GenAP on three popular code intelligence PLMs with three canonical code intelligence tasks including defect prediction, code summarization, and code translation. The results suggest that GenAP can effectively automate the process of designing prompts. Specifically, GenAP outperforms all other methods across all three tasks (e.g., improving accuracy by an average of 2.13% for defect prediction). To the best of our knowledge, GenAP is the first work to automatically design prompts for code intelligence PLMs.
Related papers
- Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing [0.9668407688201359]
generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing.
We developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion.
We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing.
arXiv Detail & Related papers (2024-10-31T16:48:41Z) - ChatGPT Code Detection: Techniques for Uncovering the Source of Code [0.0]
We use advanced classification techniques to differentiate between code written by humans and that generated by ChatGPT.
We employ a new approach that combines powerful embedding features (black-box) with supervised learning algorithms.
We show that untrained humans solve the same task not better than random guessing.
arXiv Detail & Related papers (2024-05-24T12:56:18Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs.
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language.
Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Generative Input: Towards Next-Generation Input Methods Paradigm [49.98958865125018]
We propose a novel Generative Input paradigm named GeneInput.
It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results.
The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task.
arXiv Detail & Related papers (2023-11-02T12:01:29Z) - SelfEvolve: A Code Evolution Framework via Large Language Models [5.6607714367826105]
Large language models (LLMs) have already revolutionized code generation, after being pretrained on publicly available code data.
We propose a novel two-step pipeline, called autoknow, that leverages LLMs as both knowledge providers and self-reflective programmers.
We evaluate autoknowon three code generation datasets, including DS-1000 for data science code, HumanEval for software engineering code, and TransCoder for C++-to-Python translation.
arXiv Detail & Related papers (2023-06-05T14:12:46Z) - Generate then Select: Open-ended Visual Question Answering Guided by
World Knowledge [155.81786738036578]
Open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs.
Pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources.
We propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge.
arXiv Detail & Related papers (2023-05-30T08:34:13Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - TegTok: Augmenting Text Generation via Task-specific and Open-world
Knowledge [83.55215993730326]
We propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework.
Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively.
arXiv Detail & Related papers (2022-03-16T10:37:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.