Related papers: How to Elicit Explainability Requirements? A Comparison of Interviews, Focus Groups, and Surveys

How to Elicit Explainability Requirements? A Comparison of Interviews, Focus Groups, and Surveys

URL: http://arxiv.org/abs/2505.23684v3
Date: Wed, 09 Jul 2025 17:13:26 GMT
Title: How to Elicit Explainability Requirements? A Comparison of Interviews, Focus Groups, and Surveys
Authors: Martin Obaidi, Jakob Droste, Hannah Deters, Marc Herrmann, Raymond Ochsner, Jil Klünder, Kurt Schneider,
Abstract summary: This study examines the efficiency and effectiveness of three commonly used elicitation methods - focus groups, interviews, and online surveys.<n>The results show that interviews were the most efficient, capturing the highest number of distinct needs per participant per time spent.<n>We recommend a hybrid approach combining surveys and interviews to balance efficiency and coverage.
Score: 2.30929645503432
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As software systems grow increasingly complex, explainability has become a crucial non-functional requirement for transparency, user trust, and regulatory compliance. Eliciting explainability requirements is challenging, as different methods capture varying levels of detail and structure. This study examines the efficiency and effectiveness of three commonly used elicitation methods - focus groups, interviews, and online surveys - while also assessing the role of taxonomy usage in structuring and improving the elicitation process. We conducted a case study at a large German IT consulting company, utilizing a web-based personnel management software. A total of two focus groups, 18 interviews, and an online survey with 188 participants were analyzed. The results show that interviews were the most efficient, capturing the highest number of distinct needs per participant per time spent. Surveys collected the most explanation needs overall but had high redundancy. Delayed taxonomy introduction resulted in a greater number and diversity of needs, suggesting that a two-phase approach is beneficial. Based on our findings, we recommend a hybrid approach combining surveys and interviews to balance efficiency and coverage. Future research should explore how automation can support elicitation and how taxonomies can be better integrated into different methods.

Related papers

LLMREI: Automating Requirements Elicitation Interviews with LLMs [47.032121951473435]
This study introduces LLMREI, a chat bot designed to conduct requirements elicitation interviews with minimal human intervention.<n>We evaluated its performance in 33 simulated stakeholder interviews.<n>Our findings indicate that LLMREI makes a similar number of errors compared to human interviewers, is capable of extracting a large portion of requirements, and demonstrates a notable ability to generate highly context-dependent questions.
arXiv Detail & Related papers (2025-07-03T12:18:05Z)
Feedback-Aware Monte Carlo Tree Search for Efficient Information Seeking in Goal-Oriented Conversations [10.352944689413398]
We introduce a novel approach to adaptive question-asking through a combination of Large Language Models (LLM) for generating questions that maximize information gain.<n>We present two key innovations: (1) an adaptive MCTS algorithm that balances exploration and exploitation for efficient search over potential questions; and (2) a clustering-based feedback algorithm that leverages prior experience to guide future interactions.
arXiv Detail & Related papers (2025-01-25T03:42:22Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance.<n>We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
A Comprehensive Survey on Retrieval Methods in Recommender Systems [32.1847120460637]
This survey explores the critical yet often overlooked retrieval stage of recommender systems. To achieve precise and efficient personalized retrieval, we summarize existing work in three key areas. We highlight current industrial applications through a case study on retrieval practices at a specific company.
arXiv Detail & Related papers (2024-07-11T07:09:59Z)
Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation [49.36436704082436]
How-to questions are integral to decision-making processes and require dynamic, step-by-step answers. We propose Thread, a novel data organization paradigm aimed at enabling current systems to handle how-to questions more effectively.
arXiv Detail & Related papers (2024-06-19T09:14:41Z)
Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search [89.1772985740272]
In mixed-initiative conversational search systems, clarifying questions are used to help users who struggle to express their intentions in a single query. We hypothesize that in scenarios where multimodal information is pertinent, the clarification process can be improved by using non-textual information. We collect a dataset named Melon that contains over 4k multimodal clarifying questions, enriched with over 14k images. Several analyses are conducted to understand the importance of multimodal contents during the query clarification phase.
arXiv Detail & Related papers (2024-02-12T16:04:01Z)
Promoting Research Collaboration with Open Data Driven Team Recommendation in Response to Call for Proposals [10.732914229005903]
We describe a novel system to recommend teams using a variety of AI methods. We create teams to maximize goodness along a metric balancing short- and long-term objectives.
arXiv Detail & Related papers (2023-09-18T00:04:08Z)
Leveraging Human Feedback to Scale Educational Datasets: Combining Crowdworkers and Comparative Judgement [0.0]
This paper reports on two experiments investigating using non-expert crowdworkers and comparative judgement to evaluate student data. We found that using comparative judgement substantially improved inter-rater reliability on both tasks.
arXiv Detail & Related papers (2023-05-22T10:22:14Z)
Reinforcement Learning Guided Multi-Objective Exam Paper Generation [21.945655389912112]
We propose a reinforcement learning guided Multi-Objective Exam Paper Generation framework, termed MOEPG. It simultaneously optimize three exam domain-specific objectives including difficulty degree, distribution of exam scores, and skill coverage. We show that MOEPG is feasible in addressing the multiple dilemmas of exam paper generation scenario.
arXiv Detail & Related papers (2023-03-02T07:55:52Z)
Scaling up Search Engine Audits: Practical Insights for Algorithm Auditing [68.8204255655161]
We set up experiments for eight search engines with hundreds of virtual agents placed in different regions. We demonstrate the successful performance of our research infrastructure across multiple data collections. We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time.
arXiv Detail & Related papers (2021-06-10T15:49:58Z)
Mining Implicit Relevance Feedback from User Behavior for Web Question Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance. Our approach significantly improves the accuracy of passage ranking without extra human labeled data. In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.