Requirements Elicitation Follow-Up Question Generation
- URL: http://arxiv.org/abs/2507.02858v1
- Date: Thu, 03 Jul 2025 17:59:04 GMT
- Title: Requirements Elicitation Follow-Up Question Generation
- Authors: Yuchen Shen, Anmol Singhal, Travis Breaux,
- Abstract summary: Large language models (LLMs) have exhibited state-of-the-art performance in multiple natural language processing tasks.<n>This study investigates the application of GPT-4o to generate follow-up interview questions during requirements elicitation.
- Score: 0.5120567378386615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interviews are a widely used technique in eliciting requirements to gather stakeholder needs, preferences, and expectations for a software system. Effective interviewing requires skilled interviewers to formulate appropriate interview questions in real time while facing multiple challenges, including lack of familiarity with the domain, excessive cognitive load, and information overload that hinders how humans process stakeholders' speech. Recently, large language models (LLMs) have exhibited state-of-the-art performance in multiple natural language processing tasks, including text summarization and entailment. To support interviewers, we investigate the application of GPT-4o to generate follow-up interview questions during requirements elicitation by building on a framework of common interviewer mistake types. In addition, we describe methods to generate questions based on interviewee speech. We report a controlled experiment to evaluate LLM-generated and human-authored questions with minimal guidance, and a second controlled experiment to evaluate the LLM-generated questions when generation is guided by interviewer mistake types. Our findings demonstrate that, for both experiments, the LLM-generated questions are no worse than the human-authored questions with respect to clarity, relevancy, and informativeness. In addition, LLM-generated questions outperform human-authored questions when guided by common mistakes types. This highlights the potential of using LLMs to help interviewers improve the quality and ease of requirements elicitation interviews in real time.
Related papers
- LLMREI: Automating Requirements Elicitation Interviews with LLMs [47.032121951473435]
This study introduces LLMREI, a chat bot designed to conduct requirements elicitation interviews with minimal human intervention.<n>We evaluated its performance in 33 simulated stakeholder interviews.<n>Our findings indicate that LLMREI makes a similar number of errors compared to human interviewers, is capable of extracting a large portion of requirements, and demonstrates a notable ability to generate highly context-dependent questions.
arXiv Detail & Related papers (2025-07-03T12:18:05Z) - Using Large Language Models to Develop Requirements Elicitation Skills [1.1473376666000734]
We propose conditioning a large language model to play the role of the client during a chat-based interview.<n>We find that both approaches provide sufficient information for participants to construct technically sound solutions.
arXiv Detail & Related papers (2025-03-10T19:27:38Z) - GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing [73.8469700907927]
Large Language Models (LLMs) succeed in human-guided conversations such as instruction following and question answering.<n>In this study, we first characterize LLM-guided conversation into three fundamental components: Goal Navigation; (ii) Context Management; (iii) Empathetic Engagement.<n>We compare GuideLLM with 6 state-of-the-art LLMs such as GPT-4o and Llama-3-70b-Instruct, from the perspective of interviewing quality, and autobiography generation quality.
arXiv Detail & Related papers (2025-02-10T14:11:32Z) - Can LLMs Ask Good Questions? [45.54763954234726]
We evaluate questions generated by large language models (LLMs) from context.<n>We compare them to human-authored questions across six dimensions: question type, question length, context coverage, answerability, uncommonness, and required answer length.
arXiv Detail & Related papers (2025-01-07T03:21:17Z) - NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews [65.35458530702442]
We focus on journalistic interviews, a domain rich in grounding communication and abundant in data.
We curate a dataset of 40,000 two-person informational interviews from NPR and CNN.
LLMs are significantly less likely than human interviewers to use acknowledgements and to pivot to higher-level questions.
arXiv Detail & Related papers (2024-11-21T01:37:38Z) - AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs [53.6200736559742]
AGENT-CQ consists of two stages: a generation stage and an evaluation stage.
CrowdLLM simulates human crowdsourcing judgments to assess generated questions and answers.
Experiments on the ClariQ dataset demonstrate CrowdLLM's effectiveness in evaluating question and answer quality.
arXiv Detail & Related papers (2024-10-25T17:06:27Z) - ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
We study out-of-scope questions, where the retrieved document appears semantically similar to the question but lacks the necessary information to answer it.<n>We propose a guided hallucination-based approach ELOQ to automatically generate a diverse set of out-of-scope questions from post-cutoff documents.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers [40.80290002598963]
This study explores the potential of replacing human interviewers with large language models (LLMs) to conduct scalable conversational interviews.<n>We conducted a small-scale, in-depth study with university students who were randomly assigned to a conversational interview by either AI or human interviewers.<n>Various quantitative and qualitative measures assessed interviewer adherence to guidelines, response quality, participant engagement, and overall interview efficacy.
arXiv Detail & Related papers (2024-09-16T16:03:08Z) - CaLMQA: Exploring culturally specific long-form question answering across 23 languages [58.18984409715615]
CaLMQA is a dataset of 51.7K culturally specific questions across 23 different languages.<n>We evaluate factuality, relevance and surface-level quality of LLM-generated long-form answers.
arXiv Detail & Related papers (2024-06-25T17:45:26Z) - What should I Ask: A Knowledge-driven Approach for Follow-up Questions
Generation in Conversational Surveys [63.51903260461746]
We propose a novel task for knowledge-driven follow-up question generation in conversational surveys.
We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge.
We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions.
arXiv Detail & Related papers (2022-05-23T00:57:33Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.