Advancing Mobile UI Testing by Learning Screen Usage Semantics
- URL: http://arxiv.org/abs/2505.09894v1
- Date: Thu, 15 May 2025 01:40:43 GMT
- Title: Advancing Mobile UI Testing by Learning Screen Usage Semantics
- Authors: Safwat Ali Khan,
- Abstract summary: This research seeks to enhance automated UI testing techniques by learning the screen usage semantics of mobile apps.<n>It also improves the usability of a mobile app's interface by identifying and mitigating UI design issues.
- Score: 0.42303492200814446
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The demand for quality in mobile applications has increased greatly given users' high reliance on them for daily tasks. Developers work tirelessly to ensure that their applications are both functional and user-friendly. In pursuit of this, Automated Input Generation (AIG) tools have emerged as a promising solution for testing mobile applications by simulating user interactions and exploring app functionalities. However, these tools face significant challenges in navigating complex Graphical User Interfaces (GUIs), and developers often have trouble understanding their output. More specifically, AIG tools face difficulties in navigating out of certain screens, such as login pages and advertisements, due to a lack of contextual understanding which leads to suboptimal testing coverage. Furthermore, while AIG tools can provide interaction traces consisting of action and screen details, there is limited understanding of its coverage of higher level functionalities, such as logging in, setting alarms, or saving notes. Understanding these covered use cases are essential to ensure comprehensive test coverage of app functionalities. Difficulty in testing mobile UIs can lead to the design of complex interfaces, which can adversely affect users of advanced age who often face usability barriers due to small buttons, cluttered layouts, and unintuitive navigation. There exists many studies that highlight these issues, but automated solutions for improving UI accessibility needs more attention. This research seeks to enhance automated UI testing techniques by learning the screen usage semantics of mobile apps and helping them navigate more efficiently, offer more insights about tested functionalities and also improve the usability of a mobile app's interface by identifying and mitigating UI design issues.
Related papers
- Screencast-Based Analysis of User-Perceived GUI Responsiveness [53.53923672866705]
tool is a technique that measures GUI responsiveness directly from mobile screencasts.<n>It uses computer vision to detect user interactions and analyzes frame-level visual changes to compute two key metrics.<n>tool has been deployed in an industrial testing pipeline and analyzes thousands of screencasts daily.
arXiv Detail & Related papers (2025-08-02T12:13:50Z) - MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment [63.62778707277929]
MobileGUI-RL is a scalable framework that trains GUI agent in online environment.<n>It synthesizes a curriculum of learnable tasks through self-exploration and filtering.<n>It adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards.
arXiv Detail & Related papers (2025-07-08T07:07:53Z) - AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning [82.42421823672954]
AgentCPM-GUI is built for robust and efficient on-device GUI interaction.<n>Our training pipeline includes grounding-aware pre-training to enhance perception.<n>AgentCPM-GUI achieves state-of-the-art performance on five public benchmarks.
arXiv Detail & Related papers (2025-06-02T07:30:29Z) - GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent [66.34801160469067]
MLLMs suffer from two key issues: misinterpreting UI components and outdated knowledge.<n>We propose GUI-explorer, a training-free GUI agent that incorporates two fundamental mechanisms.<n>With a task success rate of 53.7% on SPA-Bench and 47.4% on AndroidWorld, GUI-explorer shows significant improvements over SOTA agents.
arXiv Detail & Related papers (2025-05-22T16:01:06Z) - Creating General User Models from Computer Use [62.91116265732001]
This paper presents an architecture for a general user model (GUM) that learns about you by observing any interaction you have with your computer.<n>The GUM takes as input any unstructured observation of a user (e.g., device screenshots) and constructs confidence-weighted propositions that capture user knowledge and preferences.
arXiv Detail & Related papers (2025-05-16T04:00:31Z) - GUI Agents: A Survey [129.94551809688377]
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction.<n>Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods.
arXiv Detail & Related papers (2024-12-18T04:48:28Z) - Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
Aguvis is a vision-based framework for autonomous GUI agents.<n>It standardizes cross-platform interactions and incorporates structured reasoning via inner monologue.<n>It achieves state-of-the-art performance across offline and real-world online benchmarks.
arXiv Detail & Related papers (2024-12-05T18:58:26Z) - SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation [89.24729958546168]
Smartphone agents are increasingly important for helping users control devices efficiently.<n>We present SPA-Bench, a comprehensive SmartPhone Agent Benchmark designed to evaluate (M)LLM-based agents.
arXiv Detail & Related papers (2024-10-19T17:28:48Z) - Exploring Accessibility Trends and Challenges in Mobile App Development: A Study of Stack Overflow Questions [14.005637416640448]
This study presents a large-scale empirical analysis of accessibility discussions on Stack Overflow to identify the trends and challenges Android and iOS developers face.
Our results show several challenges, including integrating assistive technologies like screen readers, ensuring accessible UI design, supporting text-to-speech across languages, and conducting accessibility testing.
We envision our findings driving improvements in developer practices, research directions, tool support, and educational resources.
arXiv Detail & Related papers (2024-09-12T11:13:24Z) - Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
We propose Textual Foresight, a novel pretraining objective for learning UI screen representations.
Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken.
We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
arXiv Detail & Related papers (2024-06-12T02:43:19Z) - MotorEase: Automated Detection of Motor Impairment Accessibility Issues in Mobile App UIs [8.057618278428494]
MotorEase is capable of identifying accessibility issues in mobile app UIs that impact motor-impaired users.
It adapts computer vision and text processing techniques to enable a semantic understanding of app UI screens.
It is able to identify violations with an average accuracy of 90%, and a false positive rate of less than 9%.
arXiv Detail & Related papers (2024-03-20T15:53:07Z) - Vision-Based Mobile App GUI Testing: A Survey [29.042723121518765]
Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies.
We provide a comprehensive investigation of the state-of-the-art techniques on 271 papers, among which 92 are vision-based studies.
arXiv Detail & Related papers (2023-10-20T14:04:04Z) - Towards Automated Accessibility Report Generation for Mobile Apps [14.908672785900832]
We propose a system to generate whole app accessibility reports.
It combines varied data collection methods (e.g., app crawling, manual recording) with an existing accessibility scanner.
arXiv Detail & Related papers (2023-09-29T19:05:11Z) - ActionBert: Leveraging User Actions for Semantic Understanding of User
Interfaces [12.52699475631247]
We introduce a new pre-trained UI representation model called ActionBert.
Our methodology is designed to leverage visual, linguistic and domain-specific features in user interaction traces to pre-train generic feature representations of UIs and their components.
Experiments show that the proposed ActionBert model outperforms multi-modal baselines across all downstream tasks by up to 15.5%.
arXiv Detail & Related papers (2020-12-22T20:49:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.