How and Why LLMs Use Deprecated APIs in Code Completion? An Empirical Study
- URL: http://arxiv.org/abs/2406.09834v2
- Date: Wed, 3 Jul 2024 06:37:04 GMT
- Title: How and Why LLMs Use Deprecated APIs in Code Completion? An Empirical Study
- Authors: Chong Wang, Kaifeng Huang, Jian Zhang, Yebo Feng, Lyuye Zhang, Yang Liu, Xin Peng,
- Abstract summary: In large language models (LLMs), pre-trained or fine-tuned on large code corpora, code completion may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries.
This study involved seven advanced LLMs, 145 API mappings from eight popular Python libraries, and 28,125 completion prompts.
We propose two lightweight fixing approaches, textscReplaceAPI and textscInsertPrompt, which can serve as baseline approaches for future research.
- Score: 13.633501449498402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs, the specific problem of deprecated API usage in LLM-based code completion has not been thoroughly investigated. To address this gap, we conducted the first evaluation study on deprecated API usage in LLM-based code completion. This study involved seven advanced LLMs, 145 API mappings from eight popular Python libraries, and 28,125 completion prompts. The study results reveal the \textit{status quo} and \textit{root causes} of deprecated API usage in LLM-based code completion from the perspectives of \textit{model}, \textit{prompt}, and \textit{library}. Based on these findings, we propose two lightweight fixing approaches, \textsc{ReplaceAPI} and \textsc{InsertPrompt}, which can serve as baseline approaches for future research on mitigating deprecated API usage in LLM-based completion. Additionally, we provide implications for future research on integrating library evolution with LLM-driven software development.
Related papers
- A Solution-based LLM API-using Methodology for Academic Information Seeking [49.096714812902576]
SoAy is a solution-based LLM API-using methodology for academic information seeking.
It uses code with a solution as the reasoning method, where a solution is a pre-constructed API calling sequence.
Results show a 34.58-75.99% performance improvement compared to state-of-the-art LLM API-based baselines.
arXiv Detail & Related papers (2024-05-24T02:44:14Z) - Compositional API Recommendation for Library-Oriented Code Generation [23.355509276291198]
We propose CAPIR, which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements.
We present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation)
Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines.
arXiv Detail & Related papers (2024-02-29T18:27:27Z) - De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding [18.129031749321058]
Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks.
LLMs are mostly unaware of the code that exists within a specific project, preventing the models from making good use of existing APIs.
This paper presents De-Hallucinator, a technique that grounds the predictions of an LLM through a novel combination of retrieving suitable API references.
arXiv Detail & Related papers (2024-01-03T12:09:43Z) - Making Large Language Models A Better Foundation For Dense Retrieval [19.38740248464456]
Dense retrieval needs to learn discriminative text embeddings to represent the semantic relationship between query and document.
It may benefit from the using of large language models (LLMs), given LLMs' strong capability on semantic understanding.
We propose LLaRA (LLM adapted for dense RetrievAl), which works as a post-hoc adaptation of dense retrieval application.
arXiv Detail & Related papers (2023-12-24T15:10:35Z) - (Why) Is My Prompt Getting Worse? Rethinking Regression Testing for
Evolving LLM APIs [8.403074015356594]
Large Language Models (LLMs) are increasingly integrated into software applications.
LLMs are often updated silently and scheduled to be deprecated.
This can cause performance regression and affect prompt design choices.
arXiv Detail & Related papers (2023-11-18T17:11:12Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents.
We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question.
Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z) - LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs)
We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python.
It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.