Related papers: Towards Generating Executable Metamorphic Relations Using Large Language Models

Towards Generating Executable Metamorphic Relations Using Large Language Models

URL: http://arxiv.org/abs/2401.17019v3
Date: Fri, 11 Oct 2024 09:07:22 GMT
Title: Towards Generating Executable Metamorphic Relations Using Large Language Models
Authors: Seung Yeob Shin, Fabrizio Pastore, Domenico Bianculli, Alexandra Baicoianu,
Abstract summary: We propose an approach for automatically deriving executable MRs from requirements using large language models (LLMs) To assess the feasibility of our approach, we conducted a questionnaire-based survey in collaboration with Siemens Industry Software.
Score: 46.26208489175692
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Metamorphic testing (MT) has proven to be a successful solution to automating testing and addressing the oracle problem. However, it entails manually deriving metamorphic relations (MRs) and converting them into an executable form; these steps are time-consuming and may prevent the adoption of MT. In this paper, we propose an approach for automatically deriving executable MRs (EMRs) from requirements using large language models (LLMs). Instead of merely asking the LLM to produce EMRs, our approach relies on a few-shot prompting strategy to instruct the LLM to perform activities in the MT process, by providing requirements and API specifications, as one would do with software engineers. To assess the feasibility of our approach, we conducted a questionnaire-based survey in collaboration with Siemens Industry Software, a worldwide leader in providing industry software and services, focusing on four of their software applications. Additionally, we evaluated the accuracy of the generated EMRs for a Web application. The outcomes of our study are highly promising, as they demonstrate the capability of our approach to generate MRs and EMRs that are both comprehensible and pertinent for testing purposes.

Related papers

AutoEDA: Enabling EDA Flow Automation through Microservice-Based LLM Agents [15.41283323575065]
AutoEDA is a framework for EDA automation that leverages paralleled learning through the Model Context Protocol (MCP) specific for standardized and scalable natural language experience.<n>Results from experiments show improvements in automation accuracy and efficiency, as well as script quality when compared to existing methods.
arXiv Detail & Related papers (2025-08-01T18:23:57Z)
Querying Large Automotive Software Models: Agentic vs. Direct LLM Approaches [3.549427092296418]
Large language models (LLMs) offer new opportunities for interacting with complex software artifacts, such as software models, through natural language.<n>This paper investigates two approaches for leveraging LLMs to answer questions over software models.<n>We evaluate these approaches using an Ecore metamodel designed for timing analysis and software optimization in automotive domains.
arXiv Detail & Related papers (2025-06-16T07:34:28Z)
Beyond Formal Semantics for Capabilities and Skills: Model Context Protocol in Manufacturing [0.12289361708127876]
We present an alternative approach based on the recently introduced Model Context Protocol (MCP)<n>MCP allows systems to expose functionality through a standardized interface that is directly consumable by LLM-based agents.
arXiv Detail & Related papers (2025-06-12T13:02:16Z)
Self-Steering Language Models [113.96916935955842]
DisCIPL is a method for "self-steering" language models. DisCIPL uses a Planner model to generate a task-specific inference program. Our work opens up a design space of highly-parallelized Monte Carlo inference strategies.
arXiv Detail & Related papers (2025-04-09T17:54:22Z)
Agentic Mixture-of-Workflows for Multi-Modal Chemical Search [0.0]
Large language models (LLMs) have demonstrated promising reasoning and automation capabilities across various domains. We introduce CRAG-MoW - a novel paradigm that orchestrates multiple agentic employing distinct CRAG strategies. We benchmark CRAG-MoWs across small molecules, polymers, and chemical reactions, as well as multi-modal nuclear magnetic resonance (NMR) spectral retrieval.
arXiv Detail & Related papers (2025-02-26T23:48:02Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
The Potential of LLMs in Automating Software Testing: From Generation to Reporting [0.0]
Manual testing, while effective, can be time consuming and costly, leading to an increased demand for automated methods. Recent advancements in Large Language Models (LLMs) have significantly influenced software engineering. This paper explores an agent-oriented approach to automated software testing, using LLMs to reduce human intervention and enhance testing efficiency.
arXiv Detail & Related papers (2024-12-31T02:06:46Z)
Creation and Evaluation of a Food Product Image Dataset for Product Property Extraction [39.58317527488534]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions. Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part. We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2024-11-15T21:29:05Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline. Recent works have started exploiting large language models (LLM) to lessen such burden. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
Re-Thinking Process Mining in the AI-Based Agents Era [39.58317527488534]
Large Language Models (LLMs) have emerged as powerful conversational interfaces, and their application in process mining (PM) tasks has shown promising results. This paper proposes utilizing the AI-Based Agents (AgWf) paradigm to enhance the effectiveness of PM on LLMs. We examine various implementations of AgWf and the types of AI-based tasks involved.
arXiv Detail & Related papers (2024-08-14T10:14:18Z)
RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents [27.807695570974644]
We propose a novel method, textscRePrompt, which does agradient descent"-like approach to optimize the step-by-step instructions in the prompts given to LLM agents. By leveraging intermediate feedback, textscRePrompt can optimize the prompt without the need for a final solution checker.
arXiv Detail & Related papers (2024-06-17T01:23:11Z)
ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling [15.67321902882617]
We propose a viable path for training open-source LLMs capable of optimization modeling and developing solver codes. This work also introduces IndustryOR, the first industrial benchmark for evaluating LLMs in solving practical OR problems.
arXiv Detail & Related papers (2024-05-28T01:55:35Z)
Using Large Language Models to Understand Telecom Standards [35.343893798039765]
Large Language Models (LLMs) may provide faster access to relevant information. We evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants. Results show that LLMs can be used as a credible reference tool on telecom technical documents.
arXiv Detail & Related papers (2024-04-02T09:54:51Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions. Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part. We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z)
Just Tell Me: Prompt Engineering in Business Process Management [63.08166397142146]
GPT-3 and other language models (LMs) can effectively address various natural language processing (NLP) tasks. We argue that prompt engineering can help bring the capabilities of LMs to BPM research.
arXiv Detail & Related papers (2023-04-14T14:55:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.