Related papers: From App Features to Explanation Needs: Analyzing Correlations and Predictive Potential

From App Features to Explanation Needs: Analyzing Correlations and Predictive Potential

URL: http://arxiv.org/abs/2508.03881v1
Date: Tue, 05 Aug 2025 19:46:13 GMT
Title: From App Features to Explanation Needs: Analyzing Correlations and Predictive Potential
Authors: Martin Obaidi, Kushtrim Qengaj, Jakob Droste, Hannah Deters, Marc Herrmann, Jil Klünder, Elisa Schmid, Kurt Schneider,
Abstract summary: This study investigates whether explanation needs, classified from user reviews, can be predicted based on app properties.<n>We analyzed a gold standard dataset of 4,495 app reviews enriched with metadata.
Score: 2.2139415366377375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In today's digitized world, software systems must support users in understanding both how to interact with a system and why certain behaviors occur. This study investigates whether explanation needs, classified from user reviews, can be predicted based on app properties, enabling early consideration during development and large-scale requirements mining. We analyzed a gold standard dataset of 4,495 app reviews enriched with metadata (e.g., app version, ratings, age restriction, in-app purchases). Correlation analyses identified mostly weak associations between app properties and explanation needs, with moderate correlations only for specific features such as app version, number of reviews, and star ratings. Linear regression models showed limited predictive power, with no reliable forecasts across configurations. Validation on a manually labeled dataset of 495 reviews confirmed these findings. Categories such as Security & Privacy and System Behavior showed slightly higher predictive potential, while Interaction and User Interface remained most difficult to predict. Overall, our results highlight that explanation needs are highly context-dependent and cannot be precisely inferred from app metadata alone. Developers and requirements engineers should therefore supplement metadata analysis with direct user feedback to effectively design explainable and user-centered software systems.

Related papers

Modeling User Behavior from Adaptive Surveys with Supplemental Context [1.433758865948252]
We present LANTERN, a modular architecture for modeling user behavior by fusing adaptive survey responses with contextual signals.<n>We demonstrate the architectural value of maintaining survey primacy through selective gating, residual connections and late fusion.<n>We further investigate threshold sensitivity and the benefits of selective modality reliance through ablation and rare/frequent attribute analysis.
arXiv Detail & Related papers (2025-07-28T15:19:54Z)
Mind the Gap! Static and Interactive Evaluations of Large Audio Models [55.87220295533817]
Large Audio Models (LAMs) are designed to power voice-native experiences.<n>This study introduces an interactive approach to evaluate LAMs and collect 7,500 LAM interactions from 484 participants.
arXiv Detail & Related papers (2025-02-21T20:29:02Z)
Were You Helpful -- Predicting Helpful Votes from Amazon Reviews [0.0]
This project investigates factors that influence the perceived helpfulness of Amazon product reviews through machine learning techniques.<n>We identify key metadata characteristics that serve as strong predictors of review helpfulness.<n>This insight suggests that contextual and user-behavioral factors may be more indicative of review helpfulness than the linguistic content itself.
arXiv Detail & Related papers (2024-12-03T22:38:58Z)
Towards Extracting Ethical Concerns-related Software Requirements from App Reviews [0.0]
This study analyzes app reviews of the Uber mobile application (a popular taxi/ride app) We propose a novel approach that leverages a knowledge graph (KG) model to extract software requirements from app reviews. Our framework consists of three main components: developing an ontology with relevant entities and relations, extracting key entities from app reviews, and creating connections between them.
arXiv Detail & Related papers (2024-07-19T04:50:32Z)
Estimation of the User Contribution Rate by Leveraging Time Sequence in Pairwise Matching function-point between Users Feedback and App Updating Log [3.750389260169302]
This paper proposes a quantitative analysis approach based on the temporal correlation perception that exists in the app update log and user reviews. The main idea of this scheme is to consider valid user reviews as user requirements and app update logs as developer responses. It was found that 16.6%-43.2% of the feature of these apps would be related to the drive from the online popular user requirements.
arXiv Detail & Related papers (2023-11-26T03:52:45Z)
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning [63.77667876176978]
Large language models show improved downstream task interpretability when prompted to generate step-by-step reasoning to justify their final answers. These reasoning steps greatly improve model interpretability and verification, but objectively studying their correctness is difficult. We present ROS, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics.
arXiv Detail & Related papers (2022-12-15T15:52:39Z)
Ordinal Graph Gamma Belief Network for Social Recommender Systems [54.9487910312535]
We develop a hierarchical Bayesian model termed ordinal graph factor analysis (OGFA), which jointly models user-item and user-user interactions. OGFA not only achieves good recommendation performance, but also extracts interpretable latent factors corresponding to representative user preferences. We extend OGFA to ordinal graph gamma belief network, which is a multi-stochastic-layer deep probabilistic model.
arXiv Detail & Related papers (2022-09-12T09:19:22Z)
A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search. We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features. We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors. Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.