Checking Patch Behaviour against Test Specification
- URL: http://arxiv.org/abs/2107.13296v1
- Date: Wed, 28 Jul 2021 11:39:06 GMT
- Title: Checking Patch Behaviour against Test Specification
- Authors: Haoye Tian, Yinghua Li, Weiguo Pian, Abdoul Kader Kabor\'e, Kui Liu,
Jacques Klein, Tegawend\'e F. Bissyande
- Abstract summary: We propose a hypothesis on how the link between the patch behaviour and failing test specifications can be drawn.
We then propose BATS, an unsupervised learning-based system to predict patch correctness.
- Score: 4.723400023753107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Towards predicting patch correctness in APR, we propose a simple, but novel
hypothesis on how the link between the patch behaviour and failing test
specifications can be drawn: similar failing test cases should require similar
patches. We then propose BATS, an unsupervised learning-based system to predict
patch correctness by checking patch Behaviour Against failing Test
Specification. BATS exploits deep representation learning models for code and
patches: for a given failing test case, the yielded embedding is used to
compute similarity metrics in the search for historical similar test cases in
order to identify the associated applied patches, which are then used as a
proxy for assessing generated patch correctness. Experimentally, we first
validate our hypothesis by assessing whether ground-truth developer patches
cluster together in the same way that their associated failing test cases are
clustered. Then, after collecting a large dataset of 1278 plausible patches
(written by developers or generated by some 32 APR tools), we use BATS to
predict correctness: BATS achieves an AUC between 0.557 to 0.718 and a recall
between 0.562 and 0.854 in identifying correct patches. Compared against
previous work, we demonstrate that our approach outperforms state-of-the-art
performance in patch correctness prediction, without the need for large labeled
patch datasets in contrast with prior machine learning-based approaches. While
BATS is constrained by the availability of similar test cases, we show that it
can still be complementary to existing approaches: used in conjunction with a
recent approach implementing supervised learning, BATS improves the overall
recall in detecting correct patches. We finally show that BATS can be
complementary to the state-of-the-art PATCH-SIM dynamic approach of identifying
the correct patches for APR tools.
Related papers
- Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - Active Test-Time Adaptation: Theoretical Analyses and An Algorithm [51.84691955495693]
Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings.
We propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting.
arXiv Detail & Related papers (2024-04-07T22:31:34Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Patch Space Exploration using Static Analysis Feedback [8.13782364161157]
We show how to automatically repair memory safety issues, by leveraging static analysis to guide repair.
Our proposed approach learns what a desirable patch is by inspecting how close a patch is to fixing the bug.
We make repair scalable by creating classes of equivalent patches according to the effect they have on the symbolic heap, and then invoking the validation oracle only once per class of patch equivalence.
arXiv Detail & Related papers (2023-08-01T05:22:10Z) - APPT: Boosting Automated Patch Correctness Prediction via Fine-tuning
Pre-trained Models [15.179895484968476]
We propose APPT, a pre-trained model-based automated patch correctness assessment technique by both pre-training and fine-tuning.
We conduct an experiment on 1,183 Defects4J patches and the experimental results show that APPT achieves prediction accuracy of 79.7% and recall of 83.2%.
arXiv Detail & Related papers (2023-01-29T14:28:26Z) - Invalidator: Automated Patch Correctness Assessment via Semantic and
Syntactic Reasoning [6.269370220586248]
In this paper, we propose a novel technique to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning.
We have conducted experiments on a dataset of 885 patches generated on real-world programs in Defects4J.
Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline.
arXiv Detail & Related papers (2023-01-03T14:16:32Z) - Robust Continual Test-time Adaptation: Instance-aware BN and
Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams.
Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z) - Test-based Patch Clustering for Automatically-Generated Patches Assessment [21.051652050359852]
Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite.
Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch.
We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior.
arXiv Detail & Related papers (2022-07-22T13:39:27Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - PatchCensor: Patch Robustness Certification for Transformers via
Exhaustive Testing [7.88628640954152]
Vision Transformer (ViT) is known to be highly nonlinear like other classical neural networks and could be easily fooled by both natural and adversarial patch perturbations.
This limitation could pose a threat to the deployment of ViT in the real industrial environment, especially in safety-critical scenarios.
We propose PatchCensor, aiming to certify the patch robustness of ViT by applying exhaustive testing.
arXiv Detail & Related papers (2021-11-19T23:45:23Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.