Exploring Plausible Patches Using Source Code Embeddings in JavaScript
- URL: http://arxiv.org/abs/2103.16846v1
- Date: Wed, 31 Mar 2021 06:57:10 GMT
- Title: Exploring Plausible Patches Using Source Code Embeddings in JavaScript
- Authors: Viktor Csuvik, D\'aniel Horv\'ath, M\'ark Lajk\'o, L\'aszl\'o Vid\'acs
- Abstract summary: We trained a Doc2Vec model on an open-source JavaScript project and generated 465 patches for 10 bugs in it.
These plausible patches alongside with the developer fix are then ranked based on their similarity to the original program.
We analyzed these similarity lists and found that plain document embeddings may lead to misclassification.
- Score: 1.3327130030147563
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite the immense popularity of the Automated Program Repair (APR) field,
the question of patch validation is still open. Most of the present-day
approaches follow the so-called Generate-and-Validate approach, where first a
candidate solution is being generated and after validated against an oracle.
The latter, however, might not give a reliable result, because of the
imperfections in such oracles; one of which is usually the test suite. Although
(re-) running the test suite is right under one's nose, in real life
applications the problem of over- and underfitting often occurs, resulting in
inadequate patches. Efforts that have been made to tackle with this problem
include patch filtering, test suite expansion, careful patch producing and many
more. Most approaches to date use post-filtering relying either on test
execution traces or make use of some similarity concept measured on the
generated patches. Our goal is to investigate the nature of these
similarity-based approaches. To do so, we trained a Doc2Vec model on an
open-source JavaScript project and generated 465 patches for 10 bugs in it.
These plausible patches alongside with the developer fix are then ranked based
on their similarity to the original program. We analyzed these similarity lists
and found that plain document embeddings may lead to misclassification - it
fails to capture nuanced code semantics. Nevertheless, in some cases it also
provided useful information, thus helping to better understand the area of
Automated Program Repair.
Related papers
- Otter: Generating Tests from Issues to Validate SWE Patches [12.353105297285802]
This paper introduces Otter, an LLM-based solution for generating tests from issues.
Otter augments LLMs with rule-based analysis to check and repair their outputs, and introduces a novel self-reflective action planning stage.
Experiments show Otter outperforming state-of-the-art systems for generating tests from issues.
arXiv Detail & Related papers (2025-02-07T22:41:31Z) - SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation [84.07909405887696]
This paper is the first to consider fully unsupervised industrial anomaly detection (i.e., unsupervised AD with noisy data)
We propose memory-based unsupervised AD methods, SoftPatch and SoftPatch+, which efficiently denoise the data at the patch level.
Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset.
Comprehensive experiments conducted in diverse noise scenarios demonstrate that both SoftPatch and SoftPatch+ outperform the state-of-the-art AD methods on the MVTecAD, ViSA, and BTAD benchmarks.
arXiv Detail & Related papers (2024-12-30T11:16:49Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Patch Space Exploration using Static Analysis Feedback [8.13782364161157]
We show how to automatically repair memory safety issues, by leveraging static analysis to guide repair.
Our proposed approach learns what a desirable patch is by inspecting how close a patch is to fixing the bug.
We make repair scalable by creating classes of equivalent patches according to the effect they have on the symbolic heap, and then invoking the validation oracle only once per class of patch equivalence.
arXiv Detail & Related papers (2023-08-01T05:22:10Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z) - Accelerating Patch Validation for Program Repair with Interception-Based
Execution Scheduling [15.592392495402809]
We investigate existing mutation testing techniques and identify five classes of acceleration techniques that are suitable for general-purpose patch validation.
We propose two novel approaches: execution scheduling, which detects the equivalence between patches online, and interception-based instrumentation, which intercepts the changes of patches to the system state.
Our large-scale evaluation with four APR approaches shows that ExpressAPR accelerates patch validation by 137.1x over plainvalidation or 8.8x over the state-of-the-art approach.
arXiv Detail & Related papers (2023-05-06T06:45:25Z) - Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT [13.632199062382746]
Automated Program Repair (APR) aims to automatically generate patches for buggy programs.
Recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR.
We propose ChatRepair, the first fully automated conversation-driven APR approach.
arXiv Detail & Related papers (2023-04-01T20:57:33Z) - Test-based Patch Clustering for Automatically-Generated Patches Assessment [21.051652050359852]
Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite.
Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch.
We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior.
arXiv Detail & Related papers (2022-07-22T13:39:27Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z) - Open-sourced Dataset Protection via Backdoor Watermarking [87.15630326131901]
We propose a emphbackdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset.
We use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model.
arXiv Detail & Related papers (2020-10-12T16:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.