Embedding Code Contexts for Cryptographic API Suggestion:New
Methodologies and Comparisons
- URL: http://arxiv.org/abs/2103.08747v2
- Date: Thu, 18 Mar 2021 02:16:07 GMT
- Title: Embedding Code Contexts for Cryptographic API Suggestion:New
Methodologies and Comparisons
- Authors: Ya Xiao, Salman Ahmed, Wenjia Song, Xinyang Ge, Bimal Viswanath,
Danfeng Yao
- Abstract summary: We present a new neural network-based approach, Multi-HyLSTM for API recommendation.
We use program analysis to guide the API embedding and recommendation.
In an analysis of 245 test cases, compared with the commercial tool Codota, we achieve a top-1 recommendation accuracy of 88.98%.
- Score: 9.011910726620536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent research efforts, the vision of automatic code generation
through API recommendation has not been realized. Accuracy and expressiveness
challenges of API recommendation needs to be systematically addressed. We
present a new neural network-based approach, Multi-HyLSTM for API
recommendation --targeting cryptography-related code. Multi-HyLSTM leverages
program analysis to guide the API embedding and recommendation. By analyzing
the data dependence paths of API methods, we train embedding and specialize a
multi-path neural network architecture for API recommendation tasks that
accurately predict the next API method call. We address two previously
unreported programming language-specific challenges, differentiating
functionally similar APIs and capturing low-frequency long-range influences.
Our results confirm the effectiveness of our design choices, including
program-analysis-guided embedding, multi-path code suggestion architecture, and
low-frequency long-range-enhanced sequence learning, with high accuracy on
top-1 recommendations. We achieve a top-1 accuracy of 91.41% compared with
77.44% from the state-of-the-art tool SLANG. In an analysis of 245 test cases,
compared with the commercial tool Codota, we achieve a top-1 recommendation
accuracy of 88.98%, which is significantly better than Codota's accuracy of
64.90%. We publish our data and code as a large Java cryptographic code
dataset.
Related papers
- A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development.
Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z) - DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model [90.71963723884944]
Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research.
We introduce DiffAgent, an agent designed to screen the accurate selection in seconds via API calls.
Our evaluations reveal that DiffAgent not only excels in identifying the appropriate T2I API but also underscores the effectiveness of the SFTA training framework.
arXiv Detail & Related papers (2024-03-31T06:28:15Z) - Compositional API Recommendation for Library-Oriented Code Generation [23.355509276291198]
We propose CAPIR, which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements.
We present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation)
Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines.
arXiv Detail & Related papers (2024-02-29T18:27:27Z) - APIGen: Generative API Method Recommendation [16.541442856821]
APIGen is a generative API recommendation approach through enhanced in-context learning (ICL)
APIGen searches for similar posts to the programming queries from the lexical, syntactical, and semantic perspectives.
With the reasoning process, APIGen makes recommended APIs better meet the programming requirement of queries.
arXiv Detail & Related papers (2024-01-29T02:35:42Z) - APICom: Automatic API Completion via Prompt Learning and Adversarial
Training-based Data Augmentation [6.029137544885093]
API recommendation is the process of assisting developers in finding the required API among numerous candidate APIs.
Previous studies mainly modeled API recommendation as the recommendation task, and developers may not yet be able to find what they need.
Motivated by the neural machine translation research domain, we can model this problem as the generation task.
We propose a novel approach APICom based on prompt learning, which can generate API related to the query according to the prompts.
arXiv Detail & Related papers (2023-09-13T15:31:50Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework.
It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates.
It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z) - On the Effectiveness of Pretrained Models for API Learning [8.788509467038743]
Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc.
Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner.
Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences.
arXiv Detail & Related papers (2022-04-05T20:33:24Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - Holistic Combination of Structural and Textual Code Information for
Context based API Recommendation [28.74546332681778]
We propose a novel API recommendation approach called APIRec-CST (API Recommendation by Combining Structural and Textual code information)
APIRec-CST is a deep learning model that combines the API usage with the text information in source code based on an API Graph Network and a Code Token Network.
We show that our approach achieves a top-5, top-10 accuracy and MRR of 60.3%, 81.5%, 87.7% and 69.4%, and significantly outperforms an existing graph-based statistical approach.
arXiv Detail & Related papers (2020-10-15T04:40:42Z) - ScopeFlow: Dynamic Scene Scoping for Optical Flow [94.42139459221784]
We propose to modify the common training protocols of optical flow.
The improvement is based on observing the bias in sampling challenging data.
We find that both regularization and augmentation should decrease during the training protocol.
arXiv Detail & Related papers (2020-02-25T09:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.