Related papers: On the Effectiveness of Pretrained Models for API Learning

On the Effectiveness of Pretrained Models for API Learning

URL: http://arxiv.org/abs/2204.03498v1
Date: Tue, 5 Apr 2022 20:33:24 GMT
Title: On the Effectiveness of Pretrained Models for API Learning
Authors: Mohammad Abdul Hadi, Imam Nur Bani Yusuf, Ferdian Thung, Kien Gia Luong, Jiang Lingxiao, Fatemeh H. Fard, David Lo
Abstract summary: Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences.
Score: 8.788509467038743
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences. As it stands, the first approach treats queries and API names as bags of words. It lacks deep comprehension of the semantics of the queries. The latter approach adapts a neural language model to encode a user query into a fixed-length context vector and generate API sequences from the context vector. We want to understand the effectiveness of recent Pre-trained Transformer based Models (PTMs) for the API learning task. These PTMs are trained on large natural language corpora in an unsupervised manner to retain contextual knowledge about the language and have found success in solving similar Natural Language Processing (NLP) problems. However, the applicability of PTMs has not yet been explored for the API sequence generation task. We use a dataset that contains 7 million annotations collected from GitHub to evaluate the PTMs empirically. This dataset was also used to assess previous approaches. Based on our results, PTMs generate more accurate API sequences and outperform other related methods by around 11%. We have also identified two different tokenization approaches that can contribute to a significant boost in PTMs' performance for the API sequence generation task.

Related papers

APIRAT: Integrating Multi-source API Knowledge for Enhanced Code Translation with LLMs [6.522570957351905]
APIRAT is a novel code translation method that integrates multi-source API knowledge. APIRAT employs three API knowledge augmentation techniques, including API sequence retrieval, API sequence back-translation, and API mapping. Experiments indicate that APIRAT significantly surpasses existing LLM-based methods, achieving improvements in computational accuracy ranging from 4% to 15.1%.
arXiv Detail & Related papers (2025-04-21T04:24:49Z)
ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution. We show that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10.
arXiv Detail & Related papers (2024-12-06T19:00:15Z)
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models. Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z)
A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development. Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z)
KAT: Dependency-aware Automated API Testing with Large Language Models [1.7264233311359707]
KAT (Katalon API Testing) is a novel AI-driven approach that autonomously generates test cases to validate APIs. Our evaluation of KAT using 12 real-world services shows that it can improve validation coverage, detect more undocumented status codes, and reduce false positives in these services.
arXiv Detail & Related papers (2024-07-14T14:48:18Z)
Contextual API Completion for Unseen Repositories Using LLMs [6.518508607788089]
We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual information within a code repository for API completion tasks. Our approach is tailored to refine code completion tasks, with a focus on optimizing local API completions. Our tool, LANCE, surpasses Copilot by 143% and 142% for API token completion and conversational API completion, respectively.
arXiv Detail & Related papers (2024-05-07T18:22:28Z)
APIGen: Generative API Method Recommendation [16.541442856821]
APIGen is a generative API recommendation approach through enhanced in-context learning (ICL) APIGen searches for similar posts to the programming queries from the lexical, syntactical, and semantic perspectives. With the reasoning process, APIGen makes recommended APIs better meet the programming requirement of queries.
arXiv Detail & Related papers (2024-01-29T02:35:42Z)
Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification. Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z)
APICom: Automatic API Completion via Prompt Learning and Adversarial Training-based Data Augmentation [6.029137544885093]
API recommendation is the process of assisting developers in finding the required API among numerous candidate APIs. Previous studies mainly modeled API recommendation as the recommendation task, and developers may not yet be able to find what they need. Motivated by the neural machine translation research domain, we can model this problem as the generation task. We propose a novel approach APICom based on prompt learning, which can generate API related to the query according to the prompts.
arXiv Detail & Related papers (2023-09-13T15:31:50Z)
Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries. We propose a novel framework that emulates the process of programmers writing private code. We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z)
Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z)
Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program. In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language. In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z)
Compositional Generalization for Natural Language Interfaces to Web APIs [26.851998759793453]
This paper presents Okapi, a new dataset for Natural Language to executable web Application Programming Interfaces (NL2API) This dataset is in English and contains 22,508 questions and 9,019 unique API calls, covering three domains. We define new compositional generalization tasks for NL2API which explore the models' ability to extrapolate from simple API calls in the training set to new and more complex API calls in the inference phase.
arXiv Detail & Related papers (2021-12-09T20:49:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.