Related papers: MobileUPReg: Identifying User-Perceived Performance Regressions in Mobile OS Versions

MobileUPReg: Identifying User-Perceived Performance Regressions in Mobile OS Versions

URL: http://arxiv.org/abs/2509.16864v1
Date: Sun, 21 Sep 2025 01:30:00 GMT
Title: MobileUPReg: Identifying User-Perceived Performance Regressions in Mobile OS Versions
Authors: Wei Liu, Yi Wen Heng, Feng Lin, Tse-Hsun, Chen, Ahmed E. Hassan,
Abstract summary: Mobile operating systems (OS) are frequently updated, but such updates can unintentionally degrade user experience by introducing performance regressions.<n>Existing detection techniques often rely on system-level metrics or focus on specific OS components.<n>We present MobileUPReg, a black-box framework for detecting user-perceived performance regressions across OS versions.
Score: 23.30663566219316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mobile operating systems (OS) are frequently updated, but such updates can unintentionally degrade user experience by introducing performance regressions. Existing detection techniques often rely on system-level metrics (e.g., CPU or memory usage) or focus on specific OS components, which may miss regressions actually perceived by users -- such as slower responses or UI stutters. To address this gap, we present MobileUPReg, a black-box framework for detecting user-perceived performance regressions across OS versions. MobileUPReg runs the same apps under different OS versions and compares user-perceived performance metrics -- response time, finish time, launch time, and dropped frames -- to identify regressions that are truly perceptible to users. In a large-scale study, MobileUPReg achieves high accuracy in extracting user-perceived metrics and detects user-perceived regressions with 0.96 precision, 0.91 recall, and 0.93 F1-score -- significantly outperforming a statistical baseline using the Wilcoxon rank-sum test and Cliff's Delta. MobileUPReg has been deployed in an industrial CI pipeline, where it analyzes thousands of screencasts across hundreds of apps daily and has uncovered regressions missed by traditional tools. These results demonstrate that MobileUPReg enables accurate, scalable, and perceptually aligned regression detection for mobile OS validation.

Related papers

A Rubric-Supervised Critic from Sparse Real-World Outcomes [87.11204512676193]
Real-world coding agents operate with humans in the loop, where success signals are typically noisy, delayed, and sparse.<n>We propose a process to learn a "critic" model from sparse and noisy interaction data, which can then be used both as a reward model for either RL-based training or inference-time scaling.
arXiv Detail & Related papers (2026-03-04T07:23:54Z)
Screencast-Based Analysis of User-Perceived GUI Responsiveness [53.53923672866705]
tool is a technique that measures GUI responsiveness directly from mobile screencasts.<n>It uses computer vision to detect user interactions and analyzes frame-level visual changes to compute two key metrics.<n>tool has been deployed in an industrial testing pipeline and analyzes thousands of screencasts daily.
arXiv Detail & Related papers (2025-08-02T12:13:50Z)
Regression-aware Continual Learning for Android Malware Detection [9.695692033183485]
Malware evolves rapidly, forcing machine learning (ML)-based detectors to adapt continuously.<n>Continual learning (CL) has emerged as a scalable alternative, enabling incremental updates without full data access.<n>But security regression captures harmful prediction changes at the sample level, such as a malware sample that was once correctly detected but evades detection after a model update.<n>We formalize and quantify security regression in CL-based malware detectors and propose a regression-aware penalty to mitigate it.
arXiv Detail & Related papers (2025-07-24T11:31:23Z)
What If We Had Used a Different App? Reliable Counterfactual KPI Analysis in Wireless Systems [52.499838151272016]
This paper addresses the problem of estimating the values of traffic that would have been obtained if a different app had been implemented by the RAN.<n>We propose a conformal-prediction-based counterfactual analysis method for wireless systems.
arXiv Detail & Related papers (2024-09-30T18:47:26Z)
Monitoring and Adapting ML Models on Mobile Devices [17.28565076128893]
We design the first end-to-end system for continuously monitoring and adapting models on mobile devices without requiring feedback from users. Our key observation is that often model degradation is due to a specific root cause, which may affect a large group of devices. We evaluate the system on two computer vision datasets, and show it consistently boosts accuracy compared to existing approaches.
arXiv Detail & Related papers (2023-05-12T21:33:26Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
Using sequential drift detection to test the API economy [4.056434158960926]
API economy refers to the widespread integration of API (advanced programming interface) It is desirable to monitor the usage patterns and identify when the system is used in a way that was never used before. In this work we analyze both histograms and call graph of API usage to determine if the usage patterns of the system has shifted.
arXiv Detail & Related papers (2021-11-09T13:24:19Z)
Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. We formulate the regression-free model updates into a constrained optimization problem. We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)
RepPoints V2: Verification Meets Regression for Object Detection [65.120827759348]
We introduce verification tasks into the localization prediction of RepPoints. RepPoints v2 provides consistent improvements of about 2.0 mAP over the original RepPoints. We show that the proposed approach can more generally elevate other object detection frameworks as well as applications such as instance segmentation.
arXiv Detail & Related papers (2020-07-16T17:57:08Z)
Superiority of Simplicity: A Lightweight Model for Network Device Workload Prediction [58.98112070128482]
We propose a lightweight solution for series prediction based on historic observations. It consists of a heterogeneous ensemble method composed of two models - a neural network and a mean predictor. It achieves an overall $R2$ score of 0.10 on the available FedCSIS 2020 challenge dataset.
arXiv Detail & Related papers (2020-07-07T15:44:16Z)
Lumos: A Library for Diagnosing Metric Regressions in Web-Scale Applications [13.52733069152118]
We present Lumos, a Python library built using the principles of AB testing to systematically diagnose metric regressions. Lumos has been deployed across the component teams in Microsoft's Real-Time Communication applications Skype and Microsoft Teams. It has enabled engineering teams to detect 100s of real changes in metrics and reject 1000s of false alarms detected by anomaly detectors.
arXiv Detail & Related papers (2020-06-23T07:02:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.