Let's Discover More API Relations: A Large Language Model-based AI Chain
for Unsupervised API Relation Inference
- URL: http://arxiv.org/abs/2311.01266v1
- Date: Thu, 2 Nov 2023 14:25:00 GMT
- Title: Let's Discover More API Relations: A Large Language Model-based AI Chain
for Unsupervised API Relation Inference
- Authors: Qing Huang, Yanbang Sun, Zhenchang Xing, Yuanlong Cao, Jieshan Chen,
Xiwei Xu, Huan Jin, Jiaxing Lu
- Abstract summary: We propose utilizing large language models (LLMs) as a neural knowledge base for API relation inference.
This approach leverages the entire Web used to pre-train LLMs as a knowledge base and is insensitive to the context and complexity of input texts.
We achieve an average F1 value of 0.76 under the three datasets, significantly higher than the state-of-the-art method's average F1 value of 0.40.
- Score: 19.05884373802318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: APIs have intricate relations that can be described in text and represented
as knowledge graphs to aid software engineering tasks. Existing relation
extraction methods have limitations, such as limited API text corpus and
affected by the characteristics of the input text.To address these limitations,
we propose utilizing large language models (LLMs) (e.g., GPT-3.5) as a neural
knowledge base for API relation inference. This approach leverages the entire
Web used to pre-train LLMs as a knowledge base and is insensitive to the
context and complexity of input texts. To ensure accurate inference, we design
our analytic flow as an AI Chain with three AI modules: API FQN Parser, API
Knowledge Extractor, and API Relation Decider. The accuracy of the API FQN
parser and API Relation Decider module are 0.81 and 0.83, respectively. Using
the generative capacity of the LLM and our approach's inference capability, we
achieve an average F1 value of 0.76 under the three datasets, significantly
higher than the state-of-the-art method's average F1 value of 0.40. Compared to
CoT-based method, our AI Chain design improves the inference reliability by
67%, and the AI-crowd-intelligence strategy enhances the robustness of our
approach by 26%.
Related papers
- Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning [14.351476383642016]
We propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets.
Code2API does not require additional model training or any manual crafting rules.
It can be easily deployed on personal computers without relying on other external tools.
arXiv Detail & Related papers (2024-05-06T14:22:17Z) - APIGen: Generative API Method Recommendation [16.541442856821]
APIGen is a generative API recommendation approach through enhanced in-context learning (ICL)
APIGen searches for similar posts to the programming queries from the lexical, syntactical, and semantic perspectives.
With the reasoning process, APIGen makes recommended APIs better meet the programming requirement of queries.
arXiv Detail & Related papers (2024-01-29T02:35:42Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification.
Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z) - Let's Chat to Find the APIs: Connecting Human, LLM and Knowledge Graph
through AI Chain [21.27256145010061]
We propose a knowledge-guided query clarification approach for API recommendation.
We use a large language model (LLM) guided by knowledge graph (KG) to overcome out-of-vocabulary (OOV) failures.
Our approach is designed as an AI chain that consists of five steps, each handled by a separate LLM call.
arXiv Detail & Related papers (2023-09-28T03:31:01Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - On the Effectiveness of Pretrained Models for API Learning [8.788509467038743]
Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc.
Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner.
Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences.
arXiv Detail & Related papers (2022-04-05T20:33:24Z) - Holistic Combination of Structural and Textual Code Information for
Context based API Recommendation [28.74546332681778]
We propose a novel API recommendation approach called APIRec-CST (API Recommendation by Combining Structural and Textual code information)
APIRec-CST is a deep learning model that combines the API usage with the text information in source code based on an API Graph Network and a Code Token Network.
We show that our approach achieves a top-5, top-10 accuracy and MRR of 60.3%, 81.5%, 87.7% and 69.4%, and significantly outperforms an existing graph-based statistical approach.
arXiv Detail & Related papers (2020-10-15T04:40:42Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions.
Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.