Compositional API Recommendation for Library-Oriented Code Generation
- URL: http://arxiv.org/abs/2402.19431v1
- Date: Thu, 29 Feb 2024 18:27:27 GMT
- Title: Compositional API Recommendation for Library-Oriented Code Generation
- Authors: Zexiong Ma, Shengnan An, Bing Xie, Zeqi Lin
- Abstract summary: We propose CAPIR, which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements.
We present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation)
Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines.
- Score: 23.355509276291198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have achieved exceptional performance in code
generation. However, the performance remains unsatisfactory in generating
library-oriented code, especially for the libraries not present in the training
data of LLMs. Previous work utilizes API recommendation technology to help LLMs
use libraries: it retrieves APIs related to the user requirements, then
leverages them as context to prompt LLMs. However, developmental requirements
can be coarse-grained, requiring a combination of multiple fine-grained APIs.
This granularity inconsistency makes API recommendation a challenging task. To
address this, we propose CAPIR (Compositional API Recommendation), which adopts
a "divide-and-conquer" strategy to recommend APIs for coarse-grained
requirements. Specifically, CAPIR employs an LLM-based Decomposer to break down
a coarse-grained task description into several detailed subtasks. Then, CAPIR
applies an embedding-based Retriever to identify relevant APIs corresponding to
each subtask. Moreover, CAPIR leverages an LLM-based Reranker to filter out
redundant APIs and provides the final recommendation. To facilitate the
evaluation of API recommendation methods on coarse-grained requirements, we
present two challenging benchmarks, RAPID (Recommend APIs based on
Documentation) and LOCG (Library-Oriented Code Generation). Experimental
results on these benchmarks, demonstrate the effectiveness of CAPIR in
comparison to existing baselines. Specifically, on RAPID's Torchdata-AR
dataset, compared to the state-of-the-art API recommendation approach, CAPIR
improves recall@5 from 18.7% to 43.2% and precision@5 from 15.5% to 37.1%. On
LOCG's Torchdata-Code dataset, compared to code generation without API
recommendation, CAPIR improves pass@100 from 16.0% to 28.0%.
Related papers
- AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation [16.590226868986296]
AutoFeedback is a framework for efficient and accurate API request generation.
It implements two feedback loops during the process of generating API requests by the Large Language Models.
It achieves an accuracy of 100.00% on a real-world API dataset and reduces the cost of interaction with GPT-3.5 Turbo by 23.44%, and GPT-4 Turbo by 11.85%.
arXiv Detail & Related papers (2024-10-09T14:38:28Z) - A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development.
Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z) - FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability.
Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request.
We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z) - A Solution-based LLM API-using Methodology for Academic Information Seeking [49.096714812902576]
SoAy is a solution-based LLM API-using methodology for academic information seeking.
It uses code with a solution as the reasoning method, where a solution is a pre-constructed API calling sequence.
Results show a 34.58-75.99% performance improvement compared to state-of-the-art LLM API-based baselines.
arXiv Detail & Related papers (2024-05-24T02:44:14Z) - APIGen: Generative API Method Recommendation [16.541442856821]
APIGen is a generative API recommendation approach through enhanced in-context learning (ICL)
APIGen searches for similar posts to the programming queries from the lexical, syntactical, and semantic perspectives.
With the reasoning process, APIGen makes recommended APIs better meet the programming requirement of queries.
arXiv Detail & Related papers (2024-01-29T02:35:42Z) - Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification.
Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z) - APICom: Automatic API Completion via Prompt Learning and Adversarial
Training-based Data Augmentation [6.029137544885093]
API recommendation is the process of assisting developers in finding the required API among numerous candidate APIs.
Previous studies mainly modeled API recommendation as the recommendation task, and developers may not yet be able to find what they need.
Motivated by the neural machine translation research domain, we can model this problem as the generation task.
We propose a novel approach APICom based on prompt learning, which can generate API related to the query according to the prompts.
arXiv Detail & Related papers (2023-09-13T15:31:50Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - Holistic Combination of Structural and Textual Code Information for
Context based API Recommendation [28.74546332681778]
We propose a novel API recommendation approach called APIRec-CST (API Recommendation by Combining Structural and Textual code information)
APIRec-CST is a deep learning model that combines the API usage with the text information in source code based on an API Graph Network and a Code Token Network.
We show that our approach achieves a top-5, top-10 accuracy and MRR of 60.3%, 81.5%, 87.7% and 69.4%, and significantly outperforms an existing graph-based statistical approach.
arXiv Detail & Related papers (2020-10-15T04:40:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.