Related papers: CodeScholar: Growing Idiomatic Code Examples

CodeScholar: Growing Idiomatic Code Examples

URL: http://arxiv.org/abs/2312.15157v1
Date: Sat, 23 Dec 2023 04:06:15 GMT
Title: CodeScholar: Growing Idiomatic Code Examples
Authors: Manish Shetty, Koushik Sen, Ion Stoica
Abstract summary: We present CodeScholar, a tool that generates idiomatic code examples demonstrating the common usage of API methods. It includes a novel neural-guided search technique over graphs that grows the query APIs into idiomatic code examples. We show that CodeScholar not only helps developers but also LLM-powered programming assistants generate correct code in a program synthesis setting.
Score: 26.298684667238994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Programmers often search for usage examples for API methods. A tool that could generate realistic, idiomatic, and contextual usage examples for one or more APIs would be immensely beneficial to developers. Such a tool would relieve the need for a deep understanding of the API landscape, augment existing documentation, and help discover interactions among APIs. We present CodeScholar, a tool that generates idiomatic code examples demonstrating the common usage of API methods. It includes a novel neural-guided search technique over graphs that grows the query APIs into idiomatic code examples. Our user study demonstrates that in 70% of cases, developers prefer CodeScholar generated examples over state-of-the-art large language models (LLM) like GPT3.5. We quantitatively evaluate 60 single and 25 multi-API queries from 6 popular Python libraries and show that across-the-board CodeScholar generates more realistic, diverse, and concise examples. In addition, we show that CodeScholar not only helps developers but also LLM-powered programming assistants generate correct code in a program synthesis setting.

Related papers

Code2API: A Tool for Generating Reusable APIs from Stack Overflow Code Snippets [14.130403020877848]
Code2API is a Google Chrome extension that uses Large Language Models (LLMs) to automatically perform APIzation of code snippets on Stack Overflow. The evaluation results show that Code2API significantly outperforms the rule-based approach by a large margin.
arXiv Detail & Related papers (2025-04-19T15:49:03Z)
ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution. We show that ExploraCoder significantly improves performance for models lacking prior API knowledge, achieving an absolute increase of 11.24% over niave RAG approaches and 14.07% over pretraining methods in pass@10.
arXiv Detail & Related papers (2024-12-06T19:00:15Z)
A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models [14.665460257371164]
Large language models (LLMs) like GitHub Copilot and ChatGPT have emerged as powerful tools for code generation. We propose AutoAPIEval, a framework designed to evaluate the capabilities of LLMs in API-oriented code generation.
arXiv Detail & Related papers (2024-09-23T17:22:09Z)
A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development. Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z)
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates [77.81663273436375]
We present CodeUpdateArena, a benchmark for knowledge editing in the code domain. An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example. Our benchmark covers updates of various types to 54 functions from seven diverse Python packages.
arXiv Detail & Related papers (2024-07-08T17:55:04Z)
Contextual API Completion for Unseen Repositories Using LLMs [6.518508607788089]
We introduce a novel technique to mitigate hallucinations by leveraging global and local contextual information within a code repository for API completion tasks. Our approach is tailored to refine code completion tasks, with a focus on optimizing local API completions. Our tool, LANCE, surpasses Copilot by 143% and 142% for API token completion and conversational API completion, respectively.
arXiv Detail & Related papers (2024-05-07T18:22:28Z)
Exploring the Impact of Source Code Linearity on the Programmers Comprehension of API Code Examples [0.0]
We investigated whether the (a) linearity and (b) length of the source code in API code examples affect users performance in terms of correctness and time spent. We conducted an online controlled code comprehension experiment with 61 Java developers.
arXiv Detail & Related papers (2024-04-03T00:40:38Z)
Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries. We propose a novel framework that emulates the process of programmers writing private code. We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z)
COMEX: A Tool for Generating Customized Source Code Representations [7.151800146054561]
COMEX is a framework that allows researchers and developers to create and combine multiple code-views. It can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural snippets. It is built on tree-sitter - a widely used incremental analysis tool that supports over 40 languages.
arXiv Detail & Related papers (2023-07-10T16:46:34Z)
When Language Model Meets Private Library [25.610036042971043]
In practice, it is common for programmers to write code using private libraries. This is a challenge for language models since they have never seen private APIs during training. We propose a novel framework with two modules: the APIRetriever finds useful APIs, and then the APICoder generates code using these APIs.
arXiv Detail & Related papers (2022-10-31T11:42:06Z)
Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program. In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language. In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z)
Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent. It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics. We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)
DocCoder: Generating Code by Retrieving and Reading Docs [87.88474546826913]
We introduce DocCoder, an approach that explicitly leverages code manuals and documentation. Our approach is general, can be applied to any programming language, and is agnostic to the underlying neural model.
arXiv Detail & Related papers (2022-07-13T06:47:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.