Semantic-Preserving Transformations as Mutation Operators: A Study on Their Effectiveness in Defect Detection
- URL: http://arxiv.org/abs/2503.23448v1
- Date: Sun, 30 Mar 2025 14:00:22 GMT
- Title: Semantic-Preserving Transformations as Mutation Operators: A Study on Their Effectiveness in Defect Detection
- Authors: Max Hort, Linas Vidziunas, Leon Moonen,
- Abstract summary: We collect existing publications which implemented semantic-preserving transformations and share their implementation.<n>We empirically study the effectiveness of three different ensemble strategies for enhancing defect detection tools.<n>Our results show that reusing shared semantic-preserving transformation is difficult, sometimes even causing wrongful changes to the semantics.
- Score: 3.3590922002216197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in defect detection use language models. Existing works enhanced the training data to improve the models' robustness when applied to semantically identical code (i.e., predictions should be the same). However, the use of semantically identical code has not been considered for improving the tools during their application - a concept closely related to metamorphic testing. The goal of our study is to determine whether we can use semantic-preserving transformations, analogue to mutation operators, to improve the performance of defect detection tools in the testing stage. We first collect existing publications which implemented semantic-preserving transformations and share their implementation, such that we can reuse them. We empirically study the effectiveness of three different ensemble strategies for enhancing defect detection tools. We apply the collected transformations on the Devign dataset, considering vulnerabilities as a type of defect, and two fine-tuned large language models for defect detection (VulBERTa, PLBART). We found 28 publications with 94 different transformations. We choose to implement 39 transformations from four of the publications, but a manual check revealed that 23 out 39 transformations change code semantics. Using the 16 remaining, correct transformations and three ensemble strategies, we were not able to increase the accuracy of the defect detection models. Our results show that reusing shared semantic-preserving transformation is difficult, sometimes even causing wrongful changes to the semantics. Keywords: defect detection, language model, semantic-preserving transformation, ensemble
Related papers
- Evaluating the Effectiveness of Small Language Models in Detecting Refactoring Bugs [0.6133301815445301]
This study evaluates the effectiveness of Small Language Models (SLMs) in detecting two types of bugs in Java and Python.<n>The study covers 16 types and employs zero-shot prompting on consumer-grade hardware to evaluate the models' ability to reason about correctness without explicit prior training.<n>The proprietary o3-mini-high model achieved the highest detection rate, identifying 84.3% of Type I bugs.
arXiv Detail & Related papers (2025-02-25T18:52:28Z) - Improving Zero-Shot Object-Level Change Detection by Incorporating Visual Correspondence [13.479857959236345]
Existing change-detection approaches suffer from three major limitations.<n>We introduce a novel method that leverages change correspondences during training to improve change detection accuracy.<n>We are also the first to predict correspondences between pairs of detected changes using estimated homography and the Hungarian algorithm.
arXiv Detail & Related papers (2025-01-09T20:02:10Z) - Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Benchmarking Robustness of Adaptation Methods on Pre-trained
Vision-Language Models [49.595973365500775]
We assess the robustness of 11 widely-used adaptation methods across 4 vision-language datasets under multimodal corruptions.
Our analysis reveals that: 1) Adaptation methods are more sensitive to text corruptions than visual corruptions.
Contrary to expectations, our findings indicate that increasing the number of adaptation data and parameters does not guarantee enhanced robustness.
arXiv Detail & Related papers (2023-06-03T11:05:04Z) - Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification.
The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z) - Self-Supervised Learning for Group Equivariant Neural Networks [75.62232699377877]
Group equivariant neural networks are the models whose structure is restricted to commute with the transformations on the input.
We propose two concepts for self-supervised tasks: equivariant pretext labels and invariant contrastive loss.
Experiments on standard image recognition benchmarks demonstrate that the equivariant neural networks exploit the proposed self-supervised tasks.
arXiv Detail & Related papers (2023-03-08T08:11:26Z) - Invalidator: Automated Patch Correctness Assessment via Semantic and
Syntactic Reasoning [6.269370220586248]
In this paper, we propose a novel technique to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning.
We have conducted experiments on a dataset of 885 patches generated on real-world programs in Defects4J.
Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline.
arXiv Detail & Related papers (2023-01-03T14:16:32Z) - Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub.
We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch.
We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Learning the Relation between Code Features and Code Transforms with
Structured Prediction [13.62633524166298]
We present the first approach for structurally predicting code transforms at the level of AST nodes using conditional random fields (CRFs)
Our approach first learns offline a probabilistic model that captures how certain code transforms are applied to certain AST nodes, and then uses the learned model to predict transforms for arbitrary new, unseen code snippets.
arXiv Detail & Related papers (2019-07-22T12:42:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.