Related papers: Exploring LLMs Impact on Student-Created User Stories and Acceptance Testing in Software Development

Exploring LLMs Impact on Student-Created User Stories and Acceptance Testing in Software Development

URL: http://arxiv.org/abs/2502.02675v1
Date: Tue, 04 Feb 2025 19:35:44 GMT
Title: Exploring LLMs Impact on Student-Created User Stories and Acceptance Testing in Software Development
Authors: Allan Brockenbrough, Henry Feild, Dominic Salinas,
Abstract summary: This study investigates how LLMs (large language models) affect undergraduate software engineering students' ability to transform user feedback into user stories.<n>Students, working individually, were asked to analyze user feedback comments, appropriately group related items, and create user stories.<n>We found that LLMs help students develop valuable stories with well-defined acceptance criteria.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Agile software development methodology, a user story describes a new feature or functionality from an end user's perspective. The user story details may also incorporate acceptance testing criteria, which can be developed through negotiation with users. When creating stories from user feedback, the software engineer may maximize their usefulness by considering story attributes, including scope, independence, negotiability, and testability. This study investigates how LLMs (large language models), with guided instructions, affect undergraduate software engineering students' ability to transform user feedback into user stories. Students, working individually, were asked to analyze user feedback comments, appropriately group related items, and create user stories following the principles of INVEST, a framework for assessing user stories. We found that LLMs help students develop valuable stories with well-defined acceptance criteria. However, students tend to perform better without LLMs when creating user stories with an appropriate scope.

Related papers

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions [50.70965714314064]
Large Language Models (LLMs) are increasingly serving as personal assistants, where users share complex and diverse preferences over extended interactions.<n>This work proposes RealPref, a benchmark for evaluating realistic preference-following in personalized user-LLM interactions.
arXiv Detail & Related papers (2026-03-04T15:42:43Z)
Exploring LLMs for User Story Extraction from Mockups [0.6157382820537719]
We present a case study that analyzes the ability of LLMs to extract user stories from high-fidelity mockups.<n>Our results demonstrate that incorporating the LEL significantly enhances the accuracy and suitability of the generated user stories.<n>This approach represents a step forward in the integration of AI into requirements engineering, with the potential to improve communication between users and developers.
arXiv Detail & Related papers (2026-02-19T01:42:45Z)
User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal [58.43749783815486]
We study implicit user feedback in two user-LM interaction datasets.<n>We find that the contents of user feedback can improve model performance in short human-designed questions.<n>We also find that the usefulness of user feedback is largely tied to the quality of the user's initial prompt.
arXiv Detail & Related papers (2025-07-30T23:33:29Z)
RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing [111.06936588273868]
RMTBench is a comprehensive textbfuser-centric bilingual role-playing benchmark featuring 80 diverse characters and over 8,000 dialogue rounds.<n>Our benchmark constructs dialogues based on explicit user motivations rather than character descriptions, ensuring alignment with practical user applications.<n>By shifting focus from character background to user intention fulfillment, RMTBench bridges the gap between academic evaluation and practical deployment requirements.
arXiv Detail & Related papers (2025-07-27T16:49:47Z)
Creating General User Models from Computer Use [62.91116265732001]
This paper presents an architecture for a general user model (GUM) that learns about you by observing any interaction you have with your computer.<n>The GUM takes as input any unstructured observation of a user (e.g., device screenshots) and constructs confidence-weighted propositions that capture user knowledge and preferences.
arXiv Detail & Related papers (2025-05-16T04:00:31Z)
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale [51.9706400130481]
Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks. PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories. We evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile.
arXiv Detail & Related papers (2025-04-19T08:16:10Z)
UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering [39.79275025010785]
name is a benchmark designed to evaluate the effectiveness of user embeddings in prompting large language models for personalization. We conduct extensive experiments on various state-of-the-art methods for modeling user embeddings.
arXiv Detail & Related papers (2025-02-26T14:34:00Z)
Optimizing Data Delivery: Insights from User Preferences on Visuals, Tables, and Text [59.68239795065175]
We conduct a user study where users are shown a question and asked what they would prefer to see. We use the data to establish that a user's personal traits does influence the data outputs that they prefer.
arXiv Detail & Related papers (2024-11-12T00:24:31Z)
Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs [49.18567856499736]
We investigate whether large language models (LLMs) can be supportive of open-ended dialogue tutoring.<n>We apply a range of knowledge tracing (KT) methods on the resulting labeled data to track student knowledge levels over an entire dialogue.<n>We conduct experiments on two tutoring dialogue datasets, and show that a novel yet simple LLM-based method, LLMKT, significantly outperforms existing KT methods in predicting student response correctness in dialogues.
arXiv Detail & Related papers (2024-09-24T22:31:39Z)
Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench. We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z)
Improving Ontology Requirements Engineering with OntoChat and Participatory Prompting [3.3241053483599563]
ORE has primarily relied on manual methods, such as interviews and collaborative forums, to gather user requirements from domain experts. Current OntoChat offers a framework for ORE that utilise large language models (LLMs) to streamline the process. This study produces pre-defined prompt templates based on user queries, focusing on creating and refining personas, goals, scenarios, sample data, and data resources for user stories.
arXiv Detail & Related papers (2024-08-09T19:21:14Z)
I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation [60.00337758147594]
This study explores the proactive ability of LLMs to seek user support. We propose metrics to evaluate the trade-off between performance improvements and user burden. Our experiments show that without external feedback, many LLMs struggle to recognize their need for user support.
arXiv Detail & Related papers (2024-07-20T06:12:29Z)
Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course [49.296957552006226]
Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research. This report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students.
arXiv Detail & Related papers (2024-07-07T00:17:24Z)
User Story Tutor (UST) to Support Agile Software Developers [0.4077787659104315]
We designed, implemented, applied, and evaluated a web application called User Story Tutor (UST) UST checks the description of a given User Story for readability, and if needed, recommends appropriate practices for improvement. UST may support the continuing education of agile development teams when writing and reviewing User Stories.
arXiv Detail & Related papers (2024-06-24T01:55:01Z)
Step-Back Profiling: Distilling User History for Personalized Scientific Writing [50.481041470669766]
Large language models (LLM) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals. We introduce STEP-BACK PROFILING to personalize LLMs by distilling user history into concise profiles. Our approach outperforms the baselines by up to 3.6 points on the general personalization benchmark.
arXiv Detail & Related papers (2024-06-20T12:58:26Z)
User-LLM: Efficient LLM Contextualization with User Embeddings [23.226164112909643]
User-LLM is a novel framework that leverages user embeddings to directly contextualize large language models with user history interactions. Our approach achieves significant efficiency gains by representing user timelines directly as embeddings, leading to substantial inference speedups of up to 78.1X.
arXiv Detail & Related papers (2024-02-21T08:03:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.