Related papers: Test-based Patch Clustering for Automatically-Generated Patches Assessment

Test-based Patch Clustering for Automatically-Generated Patches Assessment

URL: http://arxiv.org/abs/2207.11082v2
Date: Tue, 27 Aug 2024 14:46:51 GMT
Title: Test-based Patch Clustering for Automatically-Generated Patches Assessment
Authors: Matias Martinez, Maria Kechagia, Anjana Perera, Justyna Petke, Federica Sarro, Aldeida Aleti,
Abstract summary: Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior.
Score: 21.051652050359852
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Previous studies have shown that Automated Program Repair (APR) techniques suffer from the overfitting problem. Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Therefore, the patches generated by apr tools need to be validated by human programmers, which can be very costly, and prevents apr tool adoption in practice. Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior. xTestCluster is applied after the patch generation phase in order to analyze the generated patches from one or more repair tools and to provide more information about those patches for facilitating patch assessment. The novelty of xTestCluster lies in using information from execution of newly generated test cases to cluster patches generated by multiple APR approaches. A cluster is formed of patches that fail on the same generated test cases. The output from xTestCluster gives developers a) a way of reducing the number of patches to analyze, as they can focus on analyzing a sample of patches from each cluster, b) additional information attached to each patch. After analyzing 902 plausible patches from 21 Java APR tools, our results show that xTestCluster is able to reduce the number of patches to review and analyze with a median of 50%. xTestCluster can save a significant amount of time for developers that have to review the multitude of patches generated by apr tools, and provides them with new test cases that expose the differences in behavior between generated patches.

Related papers

All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning [76.79222779026634]
We establish two key principles for AIGI detection through systematic analysis. textbf(1) All Patches Matter: Unlike conventional image classification where discriminative features concentrate on object-centric regions, each patch in AIGIs inherently contains synthetic artifacts due to the uniform generation process. textbf (2) More Patches Better: Leveraging distributed artifacts across more patches improves detection by capturing complementary forensic evidence. textbfPanoptic textbfPatch textbfLearning (PPL) framework.
arXiv Detail & Related papers (2025-04-02T06:32:09Z)
Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study [20.46588369793562]
Most popular benchmarks for automated issue solving are SWE-bench and its human-filtered subset SWE-bench Verified. This paper presents an in-depth empirical study of the correctness of plausible patches generated by three state-of-the-art issue-solving tools evaluated on SWE-bench Verified.
arXiv Detail & Related papers (2025-03-19T14:02:21Z)
STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay [76.06127233986663]
Test-time adaptation (TTA) aims to address the distribution shift between the training and test data with only unlabeled data at test time. This paper pays attention to the problem that conducts both sample recognition and outlier rejection during inference while outliers exist. We propose a new approach called STAble Memory rePlay (STAMP), which performs optimization over a stable memory bank instead of the risky mini-batch.
arXiv Detail & Related papers (2024-07-22T16:25:41Z)
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization [64.62570402941387]
We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Our method improves zero-shot top- 1 accuracy beyond existing prompt-learning techniques, with a 3.08% improvement over the baseline MaPLe.
arXiv Detail & Related papers (2023-11-02T17:59:32Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
Patch Space Exploration using Static Analysis Feedback [8.13782364161157]
We show how to automatically repair memory safety issues, by leveraging static analysis to guide repair. Our proposed approach learns what a desirable patch is by inspecting how close a patch is to fixing the bug. We make repair scalable by creating classes of equivalent patches according to the effect they have on the symbolic heap, and then invoking the validation oracle only once per class of patch equivalence.
arXiv Detail & Related papers (2023-08-01T05:22:10Z)
Patch-aware Batch Normalization for Improving Cross-domain Robustness [55.06956781674986]
Cross-domain tasks present a challenge in which the model's performance will degrade when the training set and the test set follow different distributions. We propose a novel method called patch-aware batch normalization (PBN) By exploiting the differences between local patches of an image, our proposed PBN can effectively enhance the robustness of the model's parameters.
arXiv Detail & Related papers (2023-04-06T03:25:42Z)
Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT [10.071615423169902]
Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Recent APR work has been focused on leveraging modern Large Language Models (LLMs) to directly generate patches for APR. We propose ChatRepair, the first fully automated conversation-driven APR approach.
arXiv Detail & Related papers (2023-04-01T20:57:33Z)
PatchZero: Zero-Shot Automatic Patch Correctness Assessment [13.19425284402493]
We propose toolname, the patch correctness assessment by adopting a large language model for code. toolname prioritizes labeled patches from existing APR tools that exhibit semantic similarity to those generated by new APR tools. Our experimental results showed that toolname can achieve an accuracy of 84.4% and an F1-score of 86.5% on average.
arXiv Detail & Related papers (2023-03-01T03:12:11Z)
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection [142.24869736769432]
Adversarial patch attacks pose a serious threat to state-of-the-art object detectors. We propose Segment and Complete defense (SAC), a framework for defending object detectors against patch attacks. We show SAC can significantly reduce the targeted attack success rate of physical patch attacks.
arXiv Detail & Related papers (2021-12-08T19:18:48Z)
Checking Patch Behaviour against Test Specification [4.723400023753107]
We propose a hypothesis on how the link between the patch behaviour and failing test specifications can be drawn. We then propose BATS, an unsupervised learning-based system to predict patch correctness.
arXiv Detail & Related papers (2021-07-28T11:39:06Z)
Exploring Plausible Patches Using Source Code Embeddings in JavaScript [1.3327130030147563]
We trained a Doc2Vec model on an open-source JavaScript project and generated 465 patches for 10 bugs in it. These plausible patches alongside with the developer fix are then ranked based on their similarity to the original program. We analyzed these similarity lists and found that plain document embeddings may lead to misclassification.
arXiv Detail & Related papers (2021-03-31T06:57:10Z)
Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches [52.67723703088284]
We propose a novel framework called multi-patch generative adversarial nets (MPGAN) MPGAN synthesises local patch features and labels unseen classes with a novel weighted voting strategy. MPGAN has significantly greater accuracy than state-of-the-art methods.
arXiv Detail & Related papers (2020-07-27T05:49:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.