Related papers: Leveraging LLMs for User Stories in AI Systems: UStAI Dataset

Leveraging LLMs for User Stories in AI Systems: UStAI Dataset

URL: http://arxiv.org/abs/2504.00513v2
Date: Wed, 23 Apr 2025 11:26:49 GMT
Title: Leveraging LLMs for User Stories in AI Systems: UStAI Dataset
Authors: Asma Yamani, Malak Baslyman, Moataz Ahmed,
Abstract summary: Large Language Models (LLMs) are emerging as a promising alternative to human-generated text.<n>This paper investigates the potential use of LLMs to generate user stories for AI systems based on abstracts from scholarly papers.<n>Our analysis demonstrates that the investigated LLMs can generate user stories inspired by the needs of various stakeholders.
Score: 0.38233569758620056
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI systems are gaining widespread adoption across various sectors and domains. Creating high-quality AI system requirements is crucial for aligning the AI system with business goals and consumer values and for social responsibility. However, with the uncertain nature of AI systems and the heavy reliance on sensitive data, more research is needed to address the elicitation and analysis of AI systems requirements. With the proprietary nature of many AI systems, there is a lack of open-source requirements artifacts and technical requirements documents for AI systems, limiting broader research and investigation. With Large Language Models (LLMs) emerging as a promising alternative to human-generated text, this paper investigates the potential use of LLMs to generate user stories for AI systems based on abstracts from scholarly papers. We conducted an empirical evaluation using three LLMs and generated $1260$ user stories from $42$ abstracts from $26$ domains. We assess their quality using the Quality User Story (QUS) framework. Moreover, we identify relevant non-functional requirements (NFRs) and ethical principles. Our analysis demonstrates that the investigated LLMs can generate user stories inspired by the needs of various stakeholders, offering a promising approach for generating user stories for research purposes and for aiding in the early requirements elicitation phase of AI systems. We have compiled and curated a collection of stories generated by various LLMs into a dataset (UStAI), which is now publicly available for use.

Related papers

The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes [0.0]
We present results from an empirical exploration, where a human expert-generated SE artifact was taken as a benchmark.<n>We then adopted a two-fold mixed-methods approach to compare AI generated artifacts against the benchmark.<n>We find that while the two-material appear very similar, AI generated artifacts exhibit serious failure modes that could be difficult to detect.
arXiv Detail & Related papers (2025-02-13T17:05:18Z)
AI-driven Personalized Privacy Assistants: a Systematic Literature Review [0.0]
We present a Systematic Literature Review (SLR) to map the existing solutions found in the scientific literature.<n>We screened several hundred unique research papers over the recent years (2013-2025), constructing a classification from 41 included papers.<n>We provide a comprehensive classification for AI-driven PPAs, delving into their architectural choices, system contexts, types of AI used, data sources, types of decisions, and control over decisions, among other facets.
arXiv Detail & Related papers (2025-02-11T16:46:56Z)
Analysis of LLMs vs Human Experts in Requirements Engineering [0.0]
Large Language Models (LLM) application to software development has been on the subject of code generation.<n>This study compares LLM's ability to elicit requirements of a software system, as compared to that of a human expert in a time-boxed and prompt-boxed study.
arXiv Detail & Related papers (2025-01-31T16:55:17Z)
Can We Trust AI Agents? An Experimental Study Towards Trustworthy LLM-Based Multi-Agent Systems for AI Ethics [10.084913433923566]
This study examines how trustworthiness-enhancing techniques affect ethical AI output generation. We design the prototype LLM-BMAS, where agents engage in structured discussions on real-world ethical AI issues. Discussions reveal terms like bias detection, transparency, accountability, user consent, compliance, fairness evaluation, and EU AI Act compliance.
arXiv Detail & Related papers (2024-10-25T20:17:59Z)
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models [55.903148392998965]
We introduce LOKI, a novel benchmark designed to evaluate the ability of LMMs to detect synthetic data across multiple modalities. The benchmark includes coarse-grained judgment and multiple-choice questions, as well as fine-grained anomaly selection and explanation tasks. We evaluate 22 open-source LMMs and 6 closed-source models on LOKI, highlighting their potential as synthetic data detectors and also revealing some limitations in the development of LMM capabilities.
arXiv Detail & Related papers (2024-10-13T05:26:36Z)
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation. Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge. RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z)
Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI Accountability [28.67753149592534]
This study bridges the accountability gap by introducing our effort towards a comprehensive metrics catalogue. Our catalogue delineates process metrics that underpin procedural integrity, resource metrics that provide necessary tools and frameworks, and product metrics that reflect the outputs of AI systems.
arXiv Detail & Related papers (2023-11-22T04:43:16Z)
AI for All: Operationalising Diversity and Inclusion Requirements for AI Systems [4.884533605897174]
This research aims to address the lack of research and practice on how to elicit and capture D&I requirements for AI systems. We have proposed a tailored user story template to capture D&I requirements and conducted focus group exercises to use the themes and user story template in writing D&I requirements for two example AI systems.
arXiv Detail & Related papers (2023-11-07T23:15:03Z)
Recommender Systems in the Era of Large Language Models (LLMs) [62.0129013439038]
Large Language Models (LLMs) have revolutionized the fields of Natural Language Processing (NLP) and Artificial Intelligence (AI) We conduct a comprehensive review of LLM-empowered recommender systems from various aspects including Pre-training, Fine-tuning, and Prompting.
arXiv Detail & Related papers (2023-07-05T06:03:40Z)
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions. This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision. We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z)
Human-Centric Multimodal Machine Learning: Recent Advances and Testbed on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach. Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes. We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z)
Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data. We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases. Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.