Related papers: Exploring the Capabilities of LLMs for Code Change Related Tasks

Exploring the Capabilities of LLMs for Code Change Related Tasks

URL: http://arxiv.org/abs/2407.02824v1
Date: Wed, 3 Jul 2024 05:49:18 GMT
Title: Exploring the Capabilities of LLMs for Code Change Related Tasks
Authors: Lishui Fan, Jiakun Liu, Zhongxin Liu, David Lo, Xin Xia, Shanping Li,
Abstract summary: Large language models (LLMs) have shown their effectiveness in code-related tasks. LLMs focus on general code syntax and semantics rather than the differences between two code versions. We conduct an empirical study using textgreater 1B parameters LLMs on three code-change-related tasks.
Score: 14.261870410238643
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Developers deal with code-change-related tasks daily, e.g., reviewing code. Pre-trained code and code-change-oriented models have been adapted to help developers with such tasks. Recently, large language models (LLMs) have shown their effectiveness in code-related tasks. However, existing LLMs for code focus on general code syntax and semantics rather than the differences between two code versions. Thus, it is an open question how LLMs perform on code-change-related tasks. To answer this question, we conduct an empirical study using \textgreater 1B parameters LLMs on three code-change-related tasks, i.e., code review generation, commit message generation, and just-in-time comment update, with in-context learning (ICL) and parameter-efficient fine-tuning (PEFT, including LoRA and prefix-tuning). We observe that the performance of LLMs is poor without examples and generally improves with examples, but more examples do not always lead to better performance. LLMs tuned with LoRA have comparable performance to the state-of-the-art small pre-trained models. Larger models are not always better, but \textsc{Llama~2} and \textsc{Code~Llama} families are always the best. The best LLMs outperform small pre-trained models on the code changes that only modify comments and perform comparably on other code changes. We suggest future work should focus more on guiding LLMs to learn the knowledge specific to the changes related to code rather than comments for code-change-related tasks.

Related papers

Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities. The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z)
Rethinking Code Refinement: Learning to Judge Code Efficiency [60.04718679054704]
Large Language Models (LLMs) have demonstrated impressive capabilities in understanding and generating codes. We propose a novel method based on the code language model that is trained to judge the efficiency between two different codes. We validate our method on multiple programming languages with multiple refinement steps, demonstrating that the proposed method can effectively distinguish between more and less efficient versions of code.
arXiv Detail & Related papers (2024-10-29T06:17:37Z)
Steering Large Language Models between Code Execution and Textual Reasoning [22.279107036500083]
Textual reasoning has inherent limitations in solving tasks with challenges in math, logics, optimization, and searching. The recently released OpenAI GPT Code Interpreter and multi-agent frameworks such as AutoGen have demonstrated remarkable proficiency of integrating code generation and execution. We propose three methods to better steer LLM code/text generation and achieve a notable improvement.
arXiv Detail & Related papers (2024-10-04T15:44:47Z)
Beyond Code Generation: Assessing Code LLM Maturity with Postconditions [9.521621889147362]
We propose a code Large Language Model maturity model based on the postcondition generation problem. We augment the EvalPlus dataset to a postcondition testing benchmark, and evaluate several open-sourced models.
arXiv Detail & Related papers (2024-07-19T08:34:30Z)
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates [77.81663273436375]
We present CodeUpdateArena, a benchmark for knowledge editing in the code domain. An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example. Our benchmark covers updates of various types to 54 functions from seven diverse Python packages.
arXiv Detail & Related papers (2024-07-08T17:55:04Z)
Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agent [2.8391355909797644]
Large language models (LLMs) have significantly improved their ability to perform tasks in the field of code generation. There is still a gap between LLMs being capable coders and being top-tier software engineers.
arXiv Detail & Related papers (2024-05-31T22:06:18Z)
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code. We find that code prompting exhibits a high-performance boost for multiple LLMs. Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
Coarse-Tuning Models of Code with Reinforcement Learning Feedback [0.0]
Large Language Models (LLMs) pre-trained on code have emerged as the dominant approach to program synthesis. We propose RLCF, that further trains a pre-trained LLM via reinforcement learning, using feedback from a grounding function that scores the quality of the code.
arXiv Detail & Related papers (2023-05-25T22:09:08Z)
CodeT5+: Open Code Large Language Models for Code Understanding and Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)
Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization) [7.699967852459232]
Developers tend to consciously and unconsciously have a collection of semantics facts in mind when working on coding tasks. One might assume that the powerful multi-layer architecture of transformer-style LLMs makes them inherently capable of doing this simple level of "code analysis" We evaluate whether automatically augmenting an LLM's prompt with semantic facts explicitly, actually helps.
arXiv Detail & Related papers (2023-04-13T20:49:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.