Related papers: Learning to Extend Program Graphs to Work-in-Progress Code

Learning to Extend Program Graphs to Work-in-Progress Code

URL: http://arxiv.org/abs/2105.14038v1
Date: Fri, 28 May 2021 18:12:22 GMT
Title: Learning to Extend Program Graphs to Work-in-Progress Code
Authors: Xuechen Li, Chris J. Maddison, Daniel Tarlow
Abstract summary: We extend the notion of program graphs to work-in-progress code by learning to predict edge relations between tokens. We consider the tasks of code completion and localizing and repairing variable misuse in a work-in-process scenario. We demonstrate that training relation-aware models with fine-tuned edges consistently leads to improved performance on both tasks.
Score: 31.235862838381966
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Source code spends most of its time in a broken or incomplete state during software development. This presents a challenge to machine learning for code, since high-performing models typically rely on graph structured representations of programs derived from traditional program analyses. Such analyses may be undefined for broken or incomplete code. We extend the notion of program graphs to work-in-progress code by learning to predict edge relations between tokens, training on well-formed code before transferring to work-in-progress code. We consider the tasks of code completion and localizing and repairing variable misuse in a work-in-process scenario. We demonstrate that training relation-aware models with fine-tuned edges consistently leads to improved performance on both tasks.

Related papers

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities [19.455889970335967]
Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. One main challenge of pre-trained models for code generation is the semantic gap between natural language requirements and source code. Retrieval-augmented framework can be leveraged to help understand the requirements and provide guidance for the generation process.
arXiv Detail & Related papers (2025-01-23T15:17:51Z)
VISUALCODER: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning [10.70881967278009]
We introduce Visual Coder, a simple yet effective approach that enhances code reasoning by integrating multimodal Chain-of-Thought (CoT) reasoning with a visual Control Flow Graph (CFG) By aligning code snippets with their corresponding CFGs, Visual Coder provides deeper insights into execution flow, enabling more accurate predictions of code behavior. Our experiments demonstrate that augmenting LLMs with visual CFGs significantly outperforms text-based CFG descriptions in code reasoning tasks.
arXiv Detail & Related papers (2024-10-30T19:07:01Z)
Learning to Predict Program Execution by Modeling Dynamic Dependency on Code Graphs [11.347234752942684]
This paper introduces a novel machine learning-based framework called CodeFlow to predict code coverage and detect runtime errors. CodeFlow represents all possible execution paths and the relationships between different statements. It learns dynamic dependencies through execution traces, which reflect the impacts among statements during execution.
arXiv Detail & Related papers (2024-08-05T20:32:00Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution. We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.