Related papers: The Software Engineering Simulations Lab: Agentic AI for RE Quality Simulations

The Software Engineering Simulations Lab: Agentic AI for RE Quality Simulations

URL: http://arxiv.org/abs/2511.17762v1
Date: Fri, 21 Nov 2025 20:19:08 GMT
Title: The Software Engineering Simulations Lab: Agentic AI for RE Quality Simulations
Authors: Henning Femmer, Ivan Esau,
Abstract summary: Quality in Requirements Engineering (RE) is still predominantly anecdotal and intuition-driven.<n>With the advent of AI-based development, the requirements quality factors may change.<n>This paper contributes a first concept, a research roadmap, a prototype, and a first feasibility study for RE simulations with agentic AI.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Context and motivation. Quality in Requirements Engineering (RE) is still predominantly anecdotal and intuition-driven. Creating a solid requirements quality model requires broad sets of empirical evidence to evaluate quality factors and their context. Problem. However, empirical data on the detailed effects of requirements quality defects is scarce, since it is costly to obtain. Furthermore, with the advent of AI-based development, the requirements quality factors may change: Requirements are no longer only consumed by humans, but increasingly also by AI agents, which might lead to a different efficient and effective requirements style. Principal ideas. We propose to extend the RE research toolbox with Agentic AI simulations, in which software engineering (SE) processes are replicated by standardized agents in stochastic, dynamic, event-driven, qualitative simulations. We argue that their speed and simplicity makes them a valuable addition to RE research, although limitations in replicating human behavior need to be studied and understood. Contribution. This paper contributes a first concept, a research roadmap, a prototype, and a first feasibility study for RE simulations with agentic AI. Study results indicate that even a naive implementation leads to executable simulations, encouraging technical improvements along with broader application in RE research.

Related papers

The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research [56.80927148740585]
We address the challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators.<n>We use mechanistic interpretability research as a testbed, build standardized research output, and develop MechEvalAgent.<n>Our work demonstrates the potential of AI agents to transform research evaluation and pave the way for rigorous scientific practices.
arXiv Detail & Related papers (2026-02-05T19:00:02Z)
PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies [88.78188489161028]
We introduce Policy Evaluation and Environment Reconstruction in Simulation (PolaRiS)<n>PolaRiS is a scalable real-to-sim framework for high-fidelity simulated robot evaluation.<n>We show that PolaRiS evaluations provide a much stronger correlation to real world generalist policy performance than existing simulated benchmarks.
arXiv Detail & Related papers (2025-12-18T18:49:41Z)
Generative AI in Simulation-Based Test Environments for Large-Scale Cyber-Physical Systems: An Industrial Study [2.432409923443071]
Quality assurance for large-scale cyber-physical systems relies on sophisticated test activities.<n>Recent advances in generative AI have led to tools that can produce executable test cases for software systems.<n>The application of generative AI techniques to simulation-based testing of large-scale cyber-physical systems remains underexplored.
arXiv Detail & Related papers (2025-12-05T08:09:13Z)
Q-REAL: Towards Realism and Plausibility Evaluation for AI-Generated Content [71.46991494014382]
We introduce Q-Real, a novel dataset for fine-grained evaluation of realism and plausibility in AI-generated images.<n>Q-Real consists of 3,088 images generated by popular text-to-image models.<n>We construct Q-Real Bench to evaluate them on two tasks: judgment and grounding with reasoning.
arXiv Detail & Related papers (2025-11-21T02:43:17Z)
Towards Human-AI Synergy in Requirements Engineering: A Framework and Preliminary Study [2.195918681143262]
This study introduces the Human-AI RE Synergy Model (HARE-SM)<n>The model integrates AI-driven analysis with human oversight to improve requirements elicitation, analysis, and validation.<n>We outline a multi-phase research methodology focused on preparing RE datasets, fine-tuning AI models, and designing collaborative human-AI.
arXiv Detail & Related papers (2025-10-28T22:29:11Z)
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models [50.35333054932747]
We introduce a novel social simulator called YuLan-OneSim.<n>Users can simply describe and refine their simulation scenarios through natural language interactions with our simulator.<n>We implement 50 default simulation scenarios spanning 8 domains, including economics, sociology, politics, psychology, organization, demographics, law, and communication.
arXiv Detail & Related papers (2025-05-12T14:05:17Z)
MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks.<n>This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents.<n>We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z)
Work in Progress: AI-Powered Engineering-Bridging Theory and Practice [0.0]
This paper explores how generative AI can help automate and improve key steps in systems engineering.<n>It examines AI's ability to analyze system requirements based on INCOSE's "good requirement" criteria.<n>The research aims to assess AI's potential to streamline engineering processes and improve learning outcomes.
arXiv Detail & Related papers (2025-02-06T17:42:00Z)
Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs [58.18140409409302]
Large Language Models (LLMs) have made substantial strides in structured tasks through Reinforcement Learning (RL)<n>Applying RL in broader domains like chatbots and content generation presents unique challenges.<n>We show a case study of reproducing existing reward model ensemble research using embedding-based reward models.
arXiv Detail & Related papers (2025-02-04T19:37:35Z)
Generative AI in Health Economics and Outcomes Research: A Taxonomy of Key Definitions and Emerging Applications, an ISPOR Working Group Report [12.204470166456561]
Generative AI shows significant potential in health economics and outcomes research (HEOR)<n>Generative AI shows significant potential in HEOR, enhancing efficiency, productivity, and offering novel solutions to complex challenges.<n>Foundation models are promising in automating complex tasks, though challenges remain in scientific reliability, bias, interpretability, and workflow integration.
arXiv Detail & Related papers (2024-10-26T15:42:50Z)
Facilitating Sim-to-real by Intrinsic Stochasticity of Real-Time Simulation in Reinforcement Learning for Robot Manipulation [1.6686307101054858]
We investigate the properties of intrinsicity of real-time simulation (RT-IS) of off-the-shelf simulation software. RT-IS requires less randomization, is not task-dependent, and achieves better generalizability than the conventional domain-randomization-powered agents.
arXiv Detail & Related papers (2023-04-12T12:15:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.