Guiding Language Models of Code with Global Context using Monitors
- URL: http://arxiv.org/abs/2306.10763v2
- Date: Fri, 3 Nov 2023 11:13:15 GMT
- Title: Guiding Language Models of Code with Global Context using Monitors
- Authors: Lakshya A Agrawal, Aditya Kanade, Navin Goyal, Shuvendu K. Lahiri,
Sriram K. Rajamani
- Abstract summary: Language models of code (LMs) work well when the surrounding code provides sufficient context.
LMs suffer from limited awareness of such global context and end up hallucinating.
We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding.
- Score: 17.05416012014561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models of code (LMs) work well when the surrounding code provides
sufficient context. This is not true when it becomes necessary to use types,
functionality or APIs defined elsewhere in the repository or a linked library,
especially those not seen during training. LMs suffer from limited awareness of
such global context and end up hallucinating.
Integrated development environments (IDEs) assist developers in understanding
repository context using static analysis. We extend this assistance, enjoyed by
developers, to LMs. We propose monitor-guided decoding (MGD) where a monitor
uses static analysis to guide the decoding. We construct a repository-level
dataset PragmaticCode for method-completion in Java and evaluate MGD on it. On
models of varying parameter scale, by monitoring for type-consistent object
dereferences, MGD consistently improves compilation rates and agreement with
ground truth. Further, LMs with fewer parameters, when augmented with MGD, can
outperform larger LMs. With MGD, SantaCoder-1.1B achieves better compilation
rate and next-identifier match than the much larger text-davinci-003 model.
We also conduct a generalizability study to evaluate the ability of MGD to
generalize to multiple programming languages (Java, C# and Rust), coding
scenarios (e.g., correct number of arguments to method calls), and to enforce
richer semantic constraints (e.g., stateful API protocols). Our data and
implementation are available at https://github.com/microsoft/monitors4codegen .
Related papers
- CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases [13.733229886643041]
Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and MBPP, but struggle with handling entire code repositories.
Similarity-based retrieval often has low recall in complex tasks, while manual tools and APIs are typically task-specific and require expert knowledge.
We introduce CodexGraph, a system that integrates LLM agents with graph database interfaces extracted from code repositories.
arXiv Detail & Related papers (2024-08-07T17:13:59Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present textbfmethodnamews, a novel benchmark designed to evaluate repository-level code generation.
We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development.
We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM)
We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective [85.48043537327258]
We propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy.
Results indicate that MANGO significantly improves the code pass rate based on the strong baselines.
The robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting.
arXiv Detail & Related papers (2024-04-11T08:30:46Z) - Perplexed: Understanding When Large Language Models are Confused [3.4208414448496027]
This paper introduces perplexed, a library for exploring where a language model is perplexed.
We conducted a case study focused on Large Language Models (LLMs) for code generation using an additional tool we built to help with the analysis of code models called codetokenizer.
We found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct.
arXiv Detail & Related papers (2024-04-09T22:03:39Z) - IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [49.903001442804594]
This work investigates the prospect of leveraging compiler intermediate representations (IR) to improve the multilingual capabilities of Code-LMs.
We first compile SLTrans, a parallel dataset consisting of nearly 4M self-contained source code files.
Next, we carry out continued causal language modelling training on SLTrans, forcing the Code-LMs to learn the IR language.
Our resulting models, dubbed IRCoder, display sizeable and consistent gains across a wide variety of code generation tasks and metrics.
arXiv Detail & Related papers (2024-03-06T17:52:08Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for
Code Generation [22.219645213202178]
This paper proposes the "Semantic Chain-of-Thought" approach to intruduce semantic information of code, named SeCoT.
We show that SeCoT can achieves state-of-the-art performance, greatly improving the potential for large models and code generation.
arXiv Detail & Related papers (2023-10-16T05:09:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.