CodeScholar: Growing Idiomatic Code Examples
- URL: http://arxiv.org/abs/2312.15157v1
- Date: Sat, 23 Dec 2023 04:06:15 GMT
- Title: CodeScholar: Growing Idiomatic Code Examples
- Authors: Manish Shetty, Koushik Sen, Ion Stoica
- Abstract summary: We present CodeScholar, a tool that generates idiomatic code examples demonstrating the common usage of API methods.
It includes a novel neural-guided search technique over graphs that grows the query APIs into idiomatic code examples.
We show that CodeScholar not only helps developers but also LLM-powered programming assistants generate correct code in a program synthesis setting.
- Score: 26.298684667238994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Programmers often search for usage examples for API methods. A tool that
could generate realistic, idiomatic, and contextual usage examples for one or
more APIs would be immensely beneficial to developers. Such a tool would
relieve the need for a deep understanding of the API landscape, augment
existing documentation, and help discover interactions among APIs. We present
CodeScholar, a tool that generates idiomatic code examples demonstrating the
common usage of API methods. It includes a novel neural-guided search technique
over graphs that grows the query APIs into idiomatic code examples. Our user
study demonstrates that in 70% of cases, developers prefer CodeScholar
generated examples over state-of-the-art large language models (LLM) like
GPT3.5. We quantitatively evaluate 60 single and 25 multi-API queries from 6
popular Python libraries and show that across-the-board CodeScholar generates
more realistic, diverse, and concise examples. In addition, we show that
CodeScholar not only helps developers but also LLM-powered programming
assistants generate correct code in a program synthesis setting.
Related papers
- A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models [14.665460257371164]
Large language models (LLMs) like GitHub Copilot and ChatGPT have emerged as powerful tools for code generation.
We propose AutoAPIEval, a framework designed to evaluate the capabilities of LLMs in API-oriented code generation.
arXiv Detail & Related papers (2024-09-23T17:22:09Z) - A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development.
Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z) - CodeUpdateArena: Benchmarking Knowledge Editing on API Updates [77.81663273436375]
We present CodeUpdateArena, a benchmark for knowledge editing in the code domain.
An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example.
Our benchmark covers updates of various types to 54 functions from seven diverse Python packages.
arXiv Detail & Related papers (2024-07-08T17:55:04Z) - Contextual API Completion for Unseen Repositories Using LLMs [6.518508607788089]
We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual information within a code repository for API completion tasks.
Our approach is tailored to refine code completion tasks, with a focus on optimizing local API completions.
Our tool, LANCE, surpasses Copilot by 143% and 142% for API token completion and conversational API completion, respectively.
arXiv Detail & Related papers (2024-05-07T18:22:28Z) - Exploring the Impact of Source Code Linearity on the Programmers Comprehension of API Code Examples [0.0]
We investigated whether the (a) linearity and (b) length of the source code in API code examples affect users performance in terms of correctness and time spent.
We conducted an online controlled code comprehension experiment with 61 Java developers.
arXiv Detail & Related papers (2024-04-03T00:40:38Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - COMEX: A Tool for Generating Customized Source Code Representations [7.151800146054561]
COMEX is a framework that allows researchers and developers to create and combine multiple code-views.
It can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural snippets.
It is built on tree-sitter - a widely used incremental analysis tool that supports over 40 languages.
arXiv Detail & Related papers (2023-07-10T16:46:34Z) - When Language Model Meets Private Library [25.610036042971043]
In practice, it is common for programmers to write code using private libraries.
This is a challenge for language models since they have never seen private APIs during training.
We propose a novel framework with two modules: the APIRetriever finds useful APIs, and then the APICoder generates code using these APIs.
arXiv Detail & Related papers (2022-10-31T11:42:06Z) - Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program.
In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language.
In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z) - DocCoder: Generating Code by Retrieving and Reading Docs [87.88474546826913]
We introduce DocCoder, an approach that explicitly leverages code manuals and documentation.
Our approach is general, can be applied to any programming language, and is agnostic to the underlying neural model.
arXiv Detail & Related papers (2022-07-13T06:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.