Related papers: Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models

Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models

URL: http://arxiv.org/abs/2403.13588v1
Date: Wed, 20 Mar 2024 13:37:00 GMT
Title: Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models
Authors: Chengzhe Feng, Yanan Sun, Ke Li, Pan Zhou, Jiancheng Lv, Aojun Lu,
Abstract summary: We investigate the effectiveness of prompt learning in code intelligence tasks. Existing automatic prompt design methods are very limited to code intelligence tasks. We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
Score: 54.58108387797138
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Pre-trained Language Models (PLMs), a popular approach for code intelligence, continue to grow in size, the computational cost of their usage has become prohibitively expensive. Prompt learning, a recent development in the field of natural language processing, emerges as a potential solution to address this challenge. In this paper, we investigate the effectiveness of prompt learning in code intelligence tasks. We unveil its reliance on manually designed prompts, which often require significant human effort and expertise. Moreover, we discover existing automatic prompt design methods are very limited to code intelligence tasks due to factors including gradient dependence, high computational demands, and limited applicability. To effectively address both issues, we propose Genetic Auto Prompt (GenAP), which utilizes an elaborate genetic algorithm to automatically design prompts. With GenAP, non-experts can effortlessly generate superior prompts compared to meticulously manual-designed ones. GenAP operates without the need for gradients or additional computational costs, rendering it gradient-free and cost-effective. Moreover, GenAP supports both understanding and generation types of code intelligence tasks, exhibiting great applicability. We conduct GenAP on three popular code intelligence PLMs with three canonical code intelligence tasks including defect prediction, code summarization, and code translation. The results suggest that GenAP can effectively automate the process of designing prompts. Specifically, GenAP outperforms all other methods across all three tasks (e.g., improving accuracy by an average of 2.13% for defect prediction). To the best of our knowledge, GenAP is the first work to automatically design prompts for code intelligence PLMs.

Related papers

AutoMCQ -- Automatically Generate Code Comprehension Questions using GenAI [0.0]
Students often do not fully understand the code they have written.<n>Being able to fully understand code is increasingly important in a world where students have access to generative artificial intelligence (GenAI) tools.<n>This paper introduces AutoMCQ, which uses GenAI for the automatic generation of multiple-choice code comprehension questions.
arXiv Detail & Related papers (2025-05-22T09:14:41Z)
Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization [56.674356045200696]
We propose a novel method to train AI agents to incorporate knowledge and skills for multiple tasks without the need for cumbersome note systems or prior high-quality demonstration data. Our approach employs an iterative process where the agent collects new experiences, receives corrective feedback from humans in the form of hints, and integrates this feedback into its weights. We demonstrate the efficacy of our approach by implementing it in a Llama-3-based agent which, after only a few rounds of feedback, outperforms advanced models GPT-4o and DeepSeek-V3 in a taskset.
arXiv Detail & Related papers (2025-02-03T17:45:46Z)
Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing [0.9668407688201359]
generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing. We developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion. We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing.
arXiv Detail & Related papers (2024-10-31T16:48:41Z)
ChatGPT Code Detection: Techniques for Uncovering the Source of Code [0.0]
We use advanced classification techniques to differentiate between code written by humans and that generated by ChatGPT. We employ a new approach that combines powerful embedding features (black-box) with supervised learning algorithms. We show that untrained humans solve the same task not better than random guessing.
arXiv Detail & Related papers (2024-05-24T12:56:18Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
Generative Input: Towards Next-Generation Input Methods Paradigm [49.98958865125018]
We propose a novel Generative Input paradigm named GeneInput. It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results. The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task.
arXiv Detail & Related papers (2023-11-02T12:01:29Z)
SelfEvolve: A Code Evolution Framework via Large Language Models [5.6607714367826105]
Large language models (LLMs) have already revolutionized code generation, after being pretrained on publicly available code data. We propose a novel two-step pipeline, called autoknow, that leverages LLMs as both knowledge providers and self-reflective programmers. We evaluate autoknowon three code generation datasets, including DS-1000 for data science code, HumanEval for software engineering code, and TransCoder for C++-to-Python translation.
arXiv Detail & Related papers (2023-06-05T14:12:46Z)
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge [155.81786738036578]
Open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs. Pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. We propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge.
arXiv Detail & Related papers (2023-05-30T08:34:13Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge [83.55215993730326]
We propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework. Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively.
arXiv Detail & Related papers (2022-03-16T10:37:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.