Exploring Plausible Patches Using Source Code Embeddings in JavaScript
- URL: http://arxiv.org/abs/2103.16846v1
- Date: Wed, 31 Mar 2021 06:57:10 GMT
- Title: Exploring Plausible Patches Using Source Code Embeddings in JavaScript
- Authors: Viktor Csuvik, D\'aniel Horv\'ath, M\'ark Lajk\'o, L\'aszl\'o Vid\'acs
- Abstract summary: We trained a Doc2Vec model on an open-source JavaScript project and generated 465 patches for 10 bugs in it.
These plausible patches alongside with the developer fix are then ranked based on their similarity to the original program.
We analyzed these similarity lists and found that plain document embeddings may lead to misclassification.
- Score: 1.3327130030147563
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite the immense popularity of the Automated Program Repair (APR) field,
the question of patch validation is still open. Most of the present-day
approaches follow the so-called Generate-and-Validate approach, where first a
candidate solution is being generated and after validated against an oracle.
The latter, however, might not give a reliable result, because of the
imperfections in such oracles; one of which is usually the test suite. Although
(re-) running the test suite is right under one's nose, in real life
applications the problem of over- and underfitting often occurs, resulting in
inadequate patches. Efforts that have been made to tackle with this problem
include patch filtering, test suite expansion, careful patch producing and many
more. Most approaches to date use post-filtering relying either on test
execution traces or make use of some similarity concept measured on the
generated patches. Our goal is to investigate the nature of these
similarity-based approaches. To do so, we trained a Doc2Vec model on an
open-source JavaScript project and generated 465 patches for 10 bugs in it.
These plausible patches alongside with the developer fix are then ranked based
on their similarity to the original program. We analyzed these similarity lists
and found that plain document embeddings may lead to misclassification - it
fails to capture nuanced code semantics. Nevertheless, in some cases it also
provided useful information, thus helping to better understand the area of
Automated Program Repair.
Related papers
- Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Program Repair by Fuzzing over Patch and Input Space [8.887795868777985]
We propose a search-based program repair as patch-level fuzzing.
We use a patch-space fuzzer to explore a patch space, while using a traditional input level fuzzer to rule out patch candidates.
We show that patch-level fuzzing and input-level fuzzing can be combined, for a co-exploration of both spaces.
arXiv Detail & Related papers (2023-08-01T17:02:45Z) - Patch Space Exploration using Static Analysis Feedback [8.13782364161157]
We show how to automatically repair memory safety issues, by leveraging static analysis to guide repair.
Our proposed approach learns what a desirable patch is by inspecting how close a patch is to fixing the bug.
We make repair scalable by creating classes of equivalent patches according to the effect they have on the symbolic heap, and then invoking the validation oracle only once per class of patch equivalence.
arXiv Detail & Related papers (2023-08-01T05:22:10Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z) - Accelerating Patch Validation for Program Repair with Interception-Based
Execution Scheduling [15.592392495402809]
We investigate existing mutation testing techniques and identify five classes of acceleration techniques that are suitable for general-purpose patch validation.
We propose two novel approaches: execution scheduling, which detects the equivalence between patches online, and interception-based instrumentation, which intercepts the changes of patches to the system state.
Our large-scale evaluation with four APR approaches shows that ExpressAPR accelerates patch validation by 137.1x over plainvalidation or 8.8x over the state-of-the-art approach.
arXiv Detail & Related papers (2023-05-06T06:45:25Z) - Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each
using ChatGPT [10.071615423169902]
Automated Program Repair (APR) aims to automatically generate patches for buggy programs.
Recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR.
We propose ChatRepair, the first fully automated conversation-driven APR approach.
arXiv Detail & Related papers (2023-04-01T20:57:33Z) - Test-based Patch Clustering for Automatically-Generated Patches Assessment [21.051652050359852]
Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite.
Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch.
We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior.
arXiv Detail & Related papers (2022-07-22T13:39:27Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z) - Open-sourced Dataset Protection via Backdoor Watermarking [87.15630326131901]
We propose a emphbackdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset.
We use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model.
arXiv Detail & Related papers (2020-10-12T16:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.