Better Context Makes Better Code Language Models: A Case Study on
Function Call Argument Completion
- URL: http://arxiv.org/abs/2306.00381v1
- Date: Thu, 1 Jun 2023 06:25:58 GMT
- Title: Better Context Makes Better Code Language Models: A Case Study on
Function Call Argument Completion
- Authors: Hengzhi Pei, Jinman Zhao, Leonard Lausen, Sheng Zha, George Karypis
- Abstract summary: We show that existing code completion models do not yield good results on our completion task.
We query a program analyzer for information relevant to a given function call, and consider ways to provide the analyzer results to different code completion models during inference and training.
Our experiments show that providing access to the function implementation and function usages greatly improves the argument completion performance.
- Score: 15.068025336990287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained code language models have enabled great progress towards program
synthesis. However, common approaches only consider in-file local context and
thus miss information and constraints imposed by other parts of the codebase
and its external dependencies. Existing code completion benchmarks also lack
such context. To resolve these restrictions we curate a new dataset of
permissively licensed Python packages that includes full projects and their
dependencies and provide tools to extract non-local information with the help
of program analyzers. We then focus on the task of function call argument
completion which requires predicting the arguments to function calls. We show
that existing code completion models do not yield good results on our
completion task. To better solve this task, we query a program analyzer for
information relevant to a given function call, and consider ways to provide the
analyzer results to different code completion models during inference and
training. Our experiments show that providing access to the function
implementation and function usages greatly improves the argument completion
performance. Our ablation study provides further insights on how different
types of information available from the program analyzer and different ways of
incorporating the information affect the model performance.
Related papers
- Evaluating Long Range Dependency Handling in Code Generation Models using Multi-Step Key Retrieval [3.1767625261233046]
We analyze the ability of several code generation models to handle long range dependencies using a suite of multi-step key retrieval tasks in context windows up to 8k tokens in length.
We find that performance degrades significantly (up to 2x) when a function references another function that is defined later in the prompt.
We also observe that models that use sliding window attention mechanisms have difficulty handling references further than the size of a single window.
arXiv Detail & Related papers (2024-07-23T02:45:22Z) - De-fine: Decomposing and Refining Visual Programs with Auto-Feedback [75.62712247421146]
De-fine is a training-free framework that decomposes complex tasks into simpler subtasks and refines programs through auto-feedback.
Our experiments across various visual tasks show that De-fine creates more robust programs.
arXiv Detail & Related papers (2023-11-21T06:24:09Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Fact-Checking Complex Claims with Program-Guided Reasoning [99.7212240712869]
Program-Guided Fact-Checking (ProgramFC) is a novel fact-checking model that decomposes complex claims into simpler sub-tasks.
We first leverage the in-context learning ability of large language models to generate reasoning programs.
We execute the program by delegating each sub-task to the corresponding sub-task handler.
arXiv Detail & Related papers (2023-05-22T06:11:15Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - Visualizing the Relationship Between Encoded Linguistic Information and
Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality.
We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances.
Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Comparative Code Structure Analysis using Deep Learning for Performance
Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure.
Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.