Can Internal Software Metrics Predict App Popularity at Launch? Yeas! and Nays!
- URL: http://arxiv.org/abs/2507.02110v1
- Date: Wed, 02 Jul 2025 19:47:41 GMT
- Title: Can Internal Software Metrics Predict App Popularity at Launch? Yeas! and Nays!
- Authors: Md Nahidul Islam Opu, Fatima Islam Mouri, Rick Kazman, Yuanfang Cai, Shaiful Chowdhury,
- Abstract summary: Internal software metrics, measurable from source code before deployment, can predict an app's popularity.<n>This study uses a dataset of 446 open-source Android apps from F-Droid.
- Score: 10.416999845729054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting mobile app popularity before release can provide developers with a strategic advantage in a competitive marketplace, yet it remains a challenging problem. This study explores whether internal software metrics, measurable from source code before deployment, can predict an app's popularity, defined by user ratings (calculated from user reviews) and DownloadsPerYear (yearly downloads). Using a dataset of 446 open-source Android apps from F-Droid, we extract a wide array of features, including system-, class-, and method-level code metrics, code smells, and app metadata. Additional information, such as user reviews, download counts, and uses-permission, was collected from the Google Play Store. We evaluate regression and classification models across three feature sets: a minimal Size-only baseline, a domain-informed Handpicked set, and a Voting set derived via feature selection algorithms. Regression models perform poorly due to skewed data, with low $R^2$ scores. However, when reframed as binary classification (Popular vs. Unpopular), results improve significantly. The best model, a Multilayer Perceptron using the Voting set, achieves F1-scores of 0.72. These results suggest that internal code metrics, although limited in their explanatory power, can serve as useful indicators of app popularity. This challenges earlier findings that dismissed internal metrics as predictors of software quality.
Related papers
- MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools [54.63478102768333]
Well-calibrated model confidences can be used to weigh the risk versus reward of potential actions.<n>We propose a novel class of model-internal confidence estimators (MICE) to better assess confidence when calling tools.
arXiv Detail & Related papers (2025-04-28T18:06:38Z) - Prioritizing App Reviews for Developer Responses on Google Play [1.5771347525430772]
Since 2013, Google Play has allowed developers to respond to user reviews.<n>Only 13% to 18% of developers engage in this practice.<n>We propose a method to prioritize reviews based on response priority.
arXiv Detail & Related papers (2025-02-03T16:56:08Z) - What should an AI assessor optimise for? [57.96463917842822]
An AI assessor is an external, ideally indepen-dent system that predicts an indicator, e.g., a loss value, of another AI system.<n>Here we address the question: is it always optimal to train the assessor for the target metric?<n>We experimentally explore this question for, respectively, regression losses and classification scores with monotonic and non-monotonic mappings.
arXiv Detail & Related papers (2025-02-01T08:41:57Z) - Robust Collaborative Filtering to Popularity Distribution Shift [56.78171423428719]
We present a simple yet effective debiasing strategy, PopGo, which quantifies and reduces the interaction-wise popularity shortcut without assumptions on the test data.
On both ID and OOD test sets, PopGo achieves significant gains over the state-of-the-art debiasing strategies.
arXiv Detail & Related papers (2023-10-16T04:20:52Z) - Revisiting Android App Categorization [5.805764439228492]
We present a comprehensive evaluation of existing Android app categorization approaches using our new ground-truth dataset.
We propose two innovative approaches that effectively outperform the performance of existing methods in both description-based and APK-based methodologies.
arXiv Detail & Related papers (2023-10-11T08:25:34Z) - Proactive Prioritization of App Issues via Contrastive Learning [2.6763498831034043]
We propose a new framework, PPrior, that enables proactive prioritization of app issues through identifying prominent reviews.
PPrior employs a pre-trained T5 model and works in three phases.
Phase one adapts the pre-trained T5 model to the user reviews data in a self-supervised fashion.
Phase two, we leverage contrastive training to learn a generic and task-independent representation of user reviews.
arXiv Detail & Related papers (2023-03-12T06:23:10Z) - Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers.
To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed.
There are no standard procedure for using this ranking information and Area Chairs may use it in different ways.
We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z) - AppQ: Warm-starting App Recommendation Based on View Graphs [37.37177133951606]
New apps often have few (or even no) user feedback, suffering from the classic cold-start problem.
Here, a fundamental requirement is the capability to accurately measure an app's quality based on its inborn features, rather than user-generated features.
We propose AppQ, a novel app quality grading and recommendation system that extracts inborn features of apps based on app source code.
arXiv Detail & Related papers (2021-09-08T17:40:48Z) - PiRank: Learning To Rank via Differentiable Sorting [85.28916333414145]
We propose PiRank, a new class of differentiable surrogates for ranking.
We show that PiRank exactly recovers the desired metrics in the limit of zero temperature.
arXiv Detail & Related papers (2020-12-12T05:07:36Z) - Emerging App Issue Identification via Online Joint Sentiment-Topic
Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT.
Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version.
Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.