Tea: Program Repair Using Neural Network Based on Program Information
Attention Matrix
- URL: http://arxiv.org/abs/2107.08262v1
- Date: Sat, 17 Jul 2021 15:49:22 GMT
- Title: Tea: Program Repair Using Neural Network Based on Program Information
Attention Matrix
- Authors: Wenshuo Wang, Chen Wu, Liang Cheng, Yang Zhang
- Abstract summary: We propose a unified representation to capture the syntax, data flow, and control flow aspects of software programs.
We then devise a method to use such a representation to guide the transformer model from NLP in better understanding and fixing buggy programs.
- Score: 14.596847020236657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advance in machine learning (ML)-driven natural language process (NLP)
points a promising direction for automatic bug fixing for software programs, as
fixing a buggy program can be transformed to a translation task. While software
programs contain much richer information than one-dimensional natural language
documents, pioneering work on using ML-driven NLP techniques for automatic
program repair only considered a limited set of such information. We
hypothesize that more comprehensive information of software programs, if
appropriately utilized, can improve the effectiveness of ML-driven NLP
approaches in repairing software programs. As the first step towards proving
this hypothesis, we propose a unified representation to capture the syntax,
data flow, and control flow aspects of software programs, and devise a method
to use such a representation to guide the transformer model from NLP in better
understanding and fixing buggy programs. Our preliminary experiment confirms
that the more comprehensive information of software programs used, the better
ML-driven NLP techniques can perform in fixing bugs in these programs.
Related papers
- Multi-Task Program Error Repair and Explanatory Diagnosis [28.711745671275477]
We present a novel machine-learning approach for Multi-task Program Error Repair and Explanatory Diagnosis (mPRED)
A pre-trained language model is used to encode the source code, and a downstream model is specifically designed to identify and repair errors.
To aid in visualizing and analyzing the program structure, we use a graph neural network for program structure visualization.
arXiv Detail & Related papers (2024-10-09T05:09:24Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments [26.236420215606238]
We develop a framework called PaR that is powered by the Large Language Model.
PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair.
The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2024-04-02T09:12:21Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - Enhancing Automated Program Repair through Fine-tuning and Prompt
Engineering [2.3826139428423576]
Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset.
Some recent studies demonstrated strong empirical evidence that code review could improve the program repair further.
We investigate if this inherent knowledge of PL and NL can be utilized to improve automated program repair.
arXiv Detail & Related papers (2023-04-16T17:29:51Z) - On ML-Based Program Translation: Perils and Promises [17.818482089078028]
This work investigates unsupervised program translators and where and why they fail.
We develop a rule-based program mutation engine, which pre-processes the input code if the input follows specific patterns and post-process the output if the output follows certain patterns.
In the future, we envision an end-to-end program translation tool where programming domain knowledge can be embedded into an ML-based translation pipeline.
arXiv Detail & Related papers (2023-02-21T16:42:20Z) - Syntax-Guided Program Reduction for Understanding Neural Code
Intelligence Models [1.1924369482115011]
We show that a syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs.
We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs.
arXiv Detail & Related papers (2022-05-28T09:04:57Z) - A Conversational Paradigm for Program Synthesis [110.94409515865867]
We propose a conversational program synthesis approach via large language models.
We train a family of large language models, called CodeGen, on natural language and programming language data.
Our findings show the emergence of conversational capabilities and the effectiveness of the proposed conversational program synthesis paradigm.
arXiv Detail & Related papers (2022-03-25T06:55:15Z) - Improving Compositionality of Neural Networks by Decoding
Representations to Inputs [83.97012077202882]
We bridge the benefits of traditional and deep learning programs by jointly training a generative model to constrain neural network activations to "decode" back to inputs.
We demonstrate applications of decodable representations to out-of-distribution detection, adversarial examples, calibration, and fairness.
arXiv Detail & Related papers (2021-06-01T20:07:16Z) - How could Neural Networks understand Programs? [67.4217527949013]
It is difficult to build a model to better understand programs, by either directly applying off-the-shelf NLP pre-training techniques to the source code, or adding features to the model by theshelf.
We propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition.
arXiv Detail & Related papers (2021-05-10T12:21:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.