MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing
- URL: http://arxiv.org/abs/2408.15815v1
- Date: Wed, 28 Aug 2024 14:24:48 GMT
- Title: MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing
- Authors: Congying Xu, Songqiang Chen, Jiarong Wu, Shing-Chi Cheung, Valerio Terragni, Hengcheng Zhu, Jialun Cao,
- Abstract summary: We propose MR-Adopt to automatically deduce the input transformation from the hard-coded source and follow-up inputs.
By incorporating MR-Adopt-generated input transformations, encoded MR-based test cases can effectively enhance the test adequacy.
- Score: 9.50422798204681
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While a recent study reveals that many developer-written test cases can encode a reusable Metamorphic Relation (MR), over 70% of them directly hard-code the source input and follow-up input in the encoded relation. Such encoded MRs, which do not contain an explicit input transformation to transform the source inputs to corresponding follow-up inputs, cannot be reused with new source inputs to enhance test adequacy. In this paper, we propose MR-Adopt (Automatic Deduction Of inPut Transformation) to automatically deduce the input transformation from the hard-coded source and follow-up inputs, aiming to enable the encoded MRs to be reused with new source inputs. With typically only one pair of source and follow-up inputs available in an MR-encoded test case as the example, we leveraged LLMs to understand the intention of the test case and generate additional examples of source-followup input pairs. This helps to guide the generation of input transformations generalizable to multiple source inputs. Besides, to mitigate the issue that LLMs generate erroneous code, we refine LLM-generated transformations by removing MR- irrelevant code elements with data-flow analysis. Finally, we assess candidate transformations based on encoded output relations and select the best transformation as the result. Evaluation results show that MR-Adopt can generate input transformations applicable to all experimental source inputs for 72.00% of encoded MRs, which is 33.33% more than using vanilla GPT-3.5. By incorporating MR- Adopt-generated input transformations, encoded MR-based test cases can effectively enhance the test adequacy, increasing the line coverage and mutation score by 10.62% and 18.91%, respectively.
Related papers
- Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output [49.893971654861424]
We present a light-weight approach for detecting nonfactual outputs from retrieval-augmented generation (RAG)
We compute a factuality score that can be thresholded to yield a binary decision.
Our experiments show high area under the ROC curve (AUC) across a wide range of relevant open source datasets.
arXiv Detail & Related papers (2024-11-01T20:44:59Z) - Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study [52.91899050612153]
transformers within pre-trained language models (PLMs) when repurposed as encoders for Automatic Speech Recognition (ASR)
Our findings reveal a notable improvement in Character Error Rate (CER) and Word Error Rate (WER) across diverse ASR tasks when transformers from pre-trained LMs are incorporated.
This underscores the potential of leveraging the semantic prowess embedded within pre-trained transformers to advance ASR systems' capabilities.
arXiv Detail & Related papers (2024-09-26T11:31:18Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems
in LLM Augmented Generation [15.382745718541063]
This paper presents MeTMaP, a framework developed to identify false vector matching in LLM-augmented generation systems.
MeTMaP is based on the idea that semantically similar texts should match and dissimilar ones should not.
Our evaluation of MeTMaP over 203 vector matching configurations, involving 29 embedding models and 7 distance metrics, uncovers significant inaccuracies.
arXiv Detail & Related papers (2024-02-22T12:13:35Z) - Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example [10.635856134931702]
Large Language Models (LLMs) are trained on vast code datasets.
We identify best practices for using LLMs to generate code variants meeting criteria of correctness, usefulness, and applicability.
Implementing these in PyCraft, we achieved an F-measure of 96.6% in identifying correct variants, expanding inputs by 58x on average, and automating changes to increase target codes by up to 39x.
arXiv Detail & Related papers (2024-02-11T09:45:00Z) - It's Never Too Late: Fusing Acoustic Information into Large Language
Models for Automatic Speech Recognition [70.77292069313154]
Large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.
In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF)
arXiv Detail & Related papers (2024-02-08T07:21:45Z) - Prompt Optimization via Adversarial In-Context Learning [51.18075178593142]
adv-ICL is implemented as a two-player game between a generator and a discriminator.
The generator tries to generate realistic enough output to fool the discriminator.
We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques.
arXiv Detail & Related papers (2023-12-05T09:44:45Z) - MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases [9.00297842984345]
We propose MR-Scout to automatically synthesize MRs from test cases in open-source software projects.
Over 97% of codified MRs are of high quality for automated test case generation.
Our qualitative study shows that 55.76% to 76.92% of codified MRs are easily comprehensible for developers.
arXiv Detail & Related papers (2023-04-15T12:53:32Z) - On-the-fly Text Retrieval for End-to-End ASR Adaptation [9.304386210911822]
We propose augmenting a transducer-based ASR model with a retrieval language model, which retrieves from an external text corpus plausible completions for a partial ASR hypothesis.
Our experiments show that the proposed model significantly improves the performance of a transducer baseline on a pair of question-answering datasets.
arXiv Detail & Related papers (2023-03-20T08:54:40Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.