Code Generation for Unknown Libraries via Reading API Documentations
- URL: http://arxiv.org/abs/2202.07806v1
- Date: Wed, 16 Feb 2022 00:36:33 GMT
- Title: Code Generation for Unknown Libraries via Reading API Documentations
- Authors: Koki Washio and Yusuke Miyao
- Abstract summary: We consider the challenge of code generation for unknown libraries without additional training.
We implement a model that can extract relevant code signatures from API documentations based on a natural language intent.
- Score: 10.122354606820416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-domain code generation is a challenging problem because the set of
functions and classes that we use are frequently changed and extended in
programming communities. We consider the challenge of code generation for
unknown libraries without additional training. In this paper, we explore a
framework of code generation that can refer to relevant API documentations like
human programmers to handle unknown libraries. As a first step of this
direction, we implement a model that can extract relevant code signatures from
API documentations based on a natural language intent and copy primitives from
the extracted signatures. Moreover, to evaluate code generation for unknown
libraries and our framework, we extend an existing dataset of open-domain code
generation and resplit it so that the evaluation data consist of only examples
using the libraries that do not appear in the training data. Experiments on our
new split show that baseline encoder-decoder models cannot generate code using
primitives of unknown libraries as expected. In contrast, our model outperforms
the baseline on the new split and can properly generate unknown primitives when
extracted code signatures are noiseless.
Related papers
- CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - Generation-Augmented Query Expansion For Code Retrieval [51.20943646688115]
We propose a generation-augmented query expansion framework.
Inspired by the human retrieval process - sketching an answer before searching.
We achieve new state-of-the-art results on the CodeSearchNet benchmark.
arXiv Detail & Related papers (2022-12-20T23:49:37Z) - When Language Model Meets Private Library [25.610036042971043]
In practice, it is common for programmers to write code using private libraries.
This is a challenge for language models since they have never seen private APIs during training.
We propose a novel framework with two modules: the APIRetriever finds useful APIs, and then the APICoder generates code using these APIs.
arXiv Detail & Related papers (2022-10-31T11:42:06Z) - DocCoder: Generating Code by Retrieving and Reading Docs [87.88474546826913]
We introduce DocCoder, an approach that explicitly leverages code manuals and documentation.
Our approach is general, can be applied to any programming language, and is agnostic to the underlying neural model.
arXiv Detail & Related papers (2022-07-13T06:47:51Z) - CERT: Continual Pre-Training on Sketches for Library-Oriented Code
Generation [46.45445767488915]
We show how to leverage an unlabelled code corpus to train a model for library-oriented code generation.
We craft two benchmarks named PandasEval and NumpyEval to evaluate library-oriented code generation.
arXiv Detail & Related papers (2022-06-14T14:44:34Z) - Retrieve and Refine: Exemplar-based Neural Comment Generation [27.90756259321855]
Comments of similar code snippets are helpful for comment generation.
We design a novel seq2seq neural network that takes the given code, its AST, its similar code, and its exemplar as input.
We evaluate our approach on a large-scale Java corpus, which contains about 2M samples.
arXiv Detail & Related papers (2020-10-09T09:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.