On the Effectiveness of Pretrained Models for API Learning
- URL: http://arxiv.org/abs/2204.03498v1
- Date: Tue, 5 Apr 2022 20:33:24 GMT
- Title: On the Effectiveness of Pretrained Models for API Learning
- Authors: Mohammad Abdul Hadi, Imam Nur Bani Yusuf, Ferdian Thung, Kien Gia
Luong, Jiang Lingxiao, Fatemeh H. Fard, David Lo
- Abstract summary: Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc.
Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner.
Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences.
- Score: 8.788509467038743
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developers frequently use APIs to implement certain functionalities, such as
parsing Excel Files, reading and writing text files line by line, etc.
Developers can greatly benefit from automatic API usage sequence generation
based on natural language queries for building applications in a faster and
cleaner manner. Existing approaches utilize information retrieval models to
search for matching API sequences given a query or use RNN-based
encoder-decoder to generate API sequences. As it stands, the first approach
treats queries and API names as bags of words. It lacks deep comprehension of
the semantics of the queries. The latter approach adapts a neural language
model to encode a user query into a fixed-length context vector and generate
API sequences from the context vector.
We want to understand the effectiveness of recent Pre-trained Transformer
based Models (PTMs) for the API learning task. These PTMs are trained on large
natural language corpora in an unsupervised manner to retain contextual
knowledge about the language and have found success in solving similar Natural
Language Processing (NLP) problems. However, the applicability of PTMs has not
yet been explored for the API sequence generation task. We use a dataset that
contains 7 million annotations collected from GitHub to evaluate the PTMs
empirically. This dataset was also used to assess previous approaches. Based on
our results, PTMs generate more accurate API sequences and outperform other
related methods by around 11%. We have also identified two different
tokenization approaches that can contribute to a significant boost in PTMs'
performance for the API sequence generation task.
Related papers
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development.
Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z) - KAT: Dependency-aware Automated API Testing with Large Language Models [1.7264233311359707]
KAT (Katalon API Testing) is a novel AI-driven approach that autonomously generates test cases to validate APIs.
Our evaluation of KAT using 12 real-world services shows that it can improve validation coverage, detect more undocumented status codes, and reduce false positives in these services.
arXiv Detail & Related papers (2024-07-14T14:48:18Z) - Contextual API Completion for Unseen Repositories Using LLMs [6.518508607788089]
We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual information within a code repository for API completion tasks.
Our approach is tailored to refine code completion tasks, with a focus on optimizing local API completions.
Our tool, LANCE, surpasses Copilot by 143% and 142% for API token completion and conversational API completion, respectively.
arXiv Detail & Related papers (2024-05-07T18:22:28Z) - APIGen: Generative API Method Recommendation [16.541442856821]
APIGen is a generative API recommendation approach through enhanced in-context learning (ICL)
APIGen searches for similar posts to the programming queries from the lexical, syntactical, and semantic perspectives.
With the reasoning process, APIGen makes recommended APIs better meet the programming requirement of queries.
arXiv Detail & Related papers (2024-01-29T02:35:42Z) - Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification.
Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z) - APICom: Automatic API Completion via Prompt Learning and Adversarial
Training-based Data Augmentation [6.029137544885093]
API recommendation is the process of assisting developers in finding the required API among numerous candidate APIs.
Previous studies mainly modeled API recommendation as the recommendation task, and developers may not yet be able to find what they need.
Motivated by the neural machine translation research domain, we can model this problem as the generation task.
We propose a novel approach APICom based on prompt learning, which can generate API related to the query according to the prompts.
arXiv Detail & Related papers (2023-09-13T15:31:50Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program.
In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language.
In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z) - Compositional Generalization for Natural Language Interfaces to Web APIs [26.851998759793453]
This paper presents Okapi, a new dataset for Natural Language to executable web Application Programming Interfaces (NL2API)
This dataset is in English and contains 22,508 questions and 9,019 unique API calls, covering three domains.
We define new compositional generalization tasks for NL2API which explore the models' ability to extrapolate from simple API calls in the training set to new and more complex API calls in the inference phase.
arXiv Detail & Related papers (2021-12-09T20:49:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.