Test Case Recommendations with Distributed Representation of Code
Syntactic Features
- URL: http://arxiv.org/abs/2310.03174v1
- Date: Wed, 4 Oct 2023 21:42:01 GMT
- Title: Test Case Recommendations with Distributed Representation of Code
Syntactic Features
- Authors: Mosab Rezaei, Hamed Alhoori, Mona Rahimi
- Abstract summary: We propose an automated approach which exploits both structural and semantic properties of source code methods and test cases.
The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations.
The model computes cosine similarity between the method's embedding and the previously-embedded training instances.
- Score: 2.225268436173329
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Frequent modifications of unit test cases are inevitable due to software's
continuous underlying changes in source code, design, and requirements. Since
manually maintaining software test suites is tedious, timely, and costly,
automating the process of generation and maintenance of test units will
significantly impact the effectiveness and efficiency of software testing
processes.
To this end, we propose an automated approach which exploits both structural
and semantic properties of source code methods and test cases to recommend the
most relevant and useful unit tests to the developers. The proposed approach
initially trains a neural network to transform method-level source code, as
well as unit tests, into distributed representations (embedded vectors) while
preserving the importance of the structure in the code. Retrieving the semantic
and structural properties of a given method, the approach computes cosine
similarity between the method's embedding and the previously-embedded training
instances. Further, according to the similarity scores between the embedding
vectors, the model identifies the closest methods of embedding and the
associated unit tests as the most similar recommendations.
The results on the Methods2Test dataset showed that, while there is no
guarantee to have similar relevant test cases for the group of similar methods,
the proposed approach extracts the most similar existing test cases for a given
method in the dataset, and evaluations show that recommended test cases
decrease the developers' effort to generating expected test cases.
Related papers
- Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.
Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.
We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z) - LLM-based Unit Test Generation via Property Retrieval [26.906316611858518]
Property-Based Retrieval Augmentation extends LLM-based Retrieval-Augmented Generation beyond basic vector, text similarity, and graph-based methods.
Our approach considers task-specific context and introduces a tailored property retrieval mechanism.
We implement this approach in a tool called APT, which sequentially performs preprocessing, property retrieval, and unit test generation.
arXiv Detail & Related papers (2024-10-17T13:33:12Z) - Introducing Ensemble Machine Learning Algorithms for Automatic Test Case Generation using Learning Based Testing [0.0]
Ensemble methods are powerful machine learning algorithms that combine multiple models to enhance prediction capabilities and reduce generalization errors.
This study aims to systematically investigate the combination of ensemble methods and base classifiers for model inference in a Learning Based Testing (LBT) algorithm to generate fault-detecting test cases for SUTs as a proof of concept.
arXiv Detail & Related papers (2024-09-06T23:24:59Z) - Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space.
Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z) - Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - Measuring Software Testability via Automatically Generated Test Cases [8.17364116624769]
We propose a new approach to pursuing testability measurements based on software metrics.
Our approach exploits automatic test generation and mutation analysis to quantify the evidence about the relative hardness of developing effective test cases.
arXiv Detail & Related papers (2023-07-30T09:48:51Z) - Benchmarking Test-Time Adaptation against Distribution Shifts in Image
Classification [77.0114672086012]
Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction.
We present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets.
arXiv Detail & Related papers (2023-07-06T16:59:53Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Test case prioritization using test case diversification and
fault-proneness estimations [0.0]
We propose an approach for TCP that takes into account test case coverage data, bug history, and test case diversification.
The diversification of test cases is preserved by incorporating fault-proneness on a clustering-based approach scheme.
The experiments show that the proposed methods are superior to coverage-based TCP methods.
arXiv Detail & Related papers (2021-06-19T15:55:24Z) - CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON)
First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.