Related papers: A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development

A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development

URL: http://arxiv.org/abs/2505.07664v1
Date: Mon, 12 May 2025 15:31:16 GMT
Title: A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development
Authors: Werner Geyer, Jessica He, Daita Sarkar, Michelle Brachman, Chris Hammond, Jennifer Heins, Zahra Ashktorab, Carlos Rosemberg, Charlie Hill,
Abstract summary: We investigate opportunities for large language models to evaluate agile epic quality in a global company.<n>High levels of satisfaction indicate that agile epics are a new, viable application of AI evaluations.
Score: 7.239833814703049
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The broad availability of generative AI offers new opportunities to support various work domains, including agile software development. Agile epics are a key artifact for product managers to communicate requirements to stakeholders. However, in practice, they are often poorly defined, leading to churn, delivery delays, and cost overruns. In this industry case study, we investigate opportunities for large language models (LLMs) to evaluate agile epic quality in a global company. Results from a user study with 17 product managers indicate how LLM evaluations could be integrated into their work practices, including perceived values and usage in improving their epics. High levels of satisfaction indicate that agile epics are a new, viable application of AI evaluations. However, our findings also outline challenges, limitations, and adoption barriers that can inform both practitioners and researchers on the integration of such evaluations into future agile work practices.

Related papers

The SPACE of AI: Real-World Lessons on AI's Impact on Developers [0.807084206814932]
We study how developers perceive AI's influence across the dimensions of the SPACE framework: Satisfaction, Performance, Activity, Collaboration and Efficiency.<n>We find that AI is broadly adopted and widely seen as enhancing productivity, particularly for routine tasks.<n>Developers report increased efficiency and satisfaction, with less evidence of impact on collaboration.
arXiv Detail & Related papers (2025-07-31T21:45:54Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Exploring Prompt Patterns in AI-Assisted Code Generation: Towards Faster and More Effective Developer-AI Collaboration [3.1861081539404137]
This paper explores the application of structured prompt patterns to minimize the number of interactions required for satisfactory AI-assisted code generation.<n>We analyzed seven distinct prompt patterns to evaluate their effectiveness in reducing back-and-forth communication between developers and AI.
arXiv Detail & Related papers (2025-06-02T12:43:08Z)
FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks [52.47895046206854]
FieldWorkArena is a benchmark for agentic AI targeting real-world field work.<n>This paper defines a new action space that agentic AI should possess for real world work environment benchmarks.
arXiv Detail & Related papers (2025-05-26T08:21:46Z)
Human-AI Experience in Integrated Development Environments: A Systematic Literature Review [2.1749194587826026]
In-IDE HAX explores the evolving dynamics of Human-Computer Interaction in AI-assisted coding environments.<n>Our findings reveal that AI-assisted coding enhances developer productivity but also introduces challenges, such as verification overhead, automation bias, and over-reliance.<n>Concerns about code correctness, security, and maintainability highlight the urgent need for explainability, verification mechanisms, and adaptive user control.
arXiv Detail & Related papers (2025-03-08T12:40:18Z)
How to Measure Performance in Agile Software Development? A Mixed-Method Study [2.477589198476322]
The study aims to identify challenges that arise when using agile software development performance metrics in practice. Results show that while widely used performance metrics are widely used in practice, agile software development teams face challenges due to a lack of transparency and standardization as well as insufficient accuracy.
arXiv Detail & Related papers (2024-07-08T19:53:01Z)
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.<n>WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z)
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation. Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge. RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)
Comparing Software Developers with ChatGPT: An Empirical Investigation [0.0]
This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration.
arXiv Detail & Related papers (2023-05-19T17:25:54Z)
AI for Agile development: a Meta-Analysis [0.0]
This study explores the benefits and challenges of integrating Artificial Intelligence with Agile software development methodologies. The review helped identify critical challenges, such as the need for specialised socio-technical expertise. Further research is needed to better understand its impact on processes and practitioners, and to address the indirect challenges associated with its implementation.
arXiv Detail & Related papers (2023-05-14T08:10:40Z)
OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents. We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.