Related papers: Don't Confuse! Redrawing GUI Navigation Flow in Mobile Apps for Visually Impaired Users

Don't Confuse! Redrawing GUI Navigation Flow in Mobile Apps for Visually Impaired Users

URL: http://arxiv.org/abs/2502.15137v1
Date: Fri, 21 Feb 2025 01:33:04 GMT
Title: Don't Confuse! Redrawing GUI Navigation Flow in Mobile Apps for Visually Impaired Users
Authors: Mengxi Zhang, Huaxiao Liu, Yuheng Zhou, Chunyang Chen, Pei Huang, Jian Zhao,
Abstract summary: It remains unclear if visually impaired users, who rely solely on the screen readers to navigate and access app information, can do so in the correct and reasonable order.<n>Considering these issues, we proposed a method named RGNF (Re-draw GUI Navigation Flow)<n>It aimed to enhance the understandability and coherence of accessing the content of each component within the Graphical User Interface (GUI)
Score: 22.747735521796077
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mobile applications (apps) are integral to our daily lives, offering diverse services and functionalities. They enable sighted users to access information coherently in an extremely convenient manner. However, it remains unclear if visually impaired users, who rely solely on the screen readers (e.g., Talkback) to navigate and access app information, can do so in the correct and reasonable order. This may result in significant information bias and operational errors. Considering these issues, in this work, we proposed a method named RGNF (Re-draw GUI Navigation Flow). It aimed to enhance the understandability and coherence of accessing the content of each component within the Graphical User Interface (GUI), together with assisting developers in creating well-designed GUI navigation flow (GNF). This method was inspired by the characteristics identified in our preliminary study, where visually impaired users expected navigation to be associated with close position and similar shape of GUI components that were read consecutively. Thus, our method relied on the principles derived from the Gestalt psychological model, aiming to group GUI components into different regions according to the laws of proximity and similarity, thereby redrawing the GNFs. To evaluate the effectiveness of our method, we calculated sequence similarity values before and after redrawing the GNF, and further employed the tools proposed by Alotaibi et al. to measure the reachability of GUI components. Our results demonstrated a substantial improvement in similarity (0.921) compared to the baseline (0.624), together with the reachability (90.31%) compared to the baseline GNF (74.35%). Furthermore, a qualitative user study revealed that our method had a positive effect on providing visually impaired users with an improved user experience.

Related papers

GUI-ReRank: Enhancing GUI Retrieval with Multi-Modal LLM-based Reranking [55.762798168494726]
GUI-ReRank is a novel framework that integrates rapid embedding-based constrained retrieval models with highly effective MLLM-based reranking techniques.<n>We evaluated our approach on an established NL-based GUI retrieval benchmark.
arXiv Detail & Related papers (2025-08-05T10:17:38Z)
Screencast-Based Analysis of User-Perceived GUI Responsiveness [53.53923672866705]
tool is a technique that measures GUI responsiveness directly from mobile screencasts.<n>It uses computer vision to detect user interactions and analyzes frame-level visual changes to compute two key metrics.<n>tool has been deployed in an industrial testing pipeline and analyzes thousands of screencasts daily.
arXiv Detail & Related papers (2025-08-02T12:13:50Z)
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment [63.62778707277929]
MobileGUI-RL is a scalable framework that trains GUI agent in online environment.<n>It synthesizes a curriculum of learnable tasks through self-exploration and filtering.<n>It adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards.
arXiv Detail & Related papers (2025-07-08T07:07:53Z)
Learning, Reasoning, Refinement: A Framework for Kahneman's Dual-System Intelligence in GUI Agents [15.303188467166752]
We present CogniGUI, a cognitive framework developed to overcome limitations by enabling adaptive learning for GUI automation resembling human-like behavior.<n>To assess the generalization and adaptability of agent systems, we introduce ScreenSeek, a comprehensive benchmark that includes multi application navigation, dynamic state transitions, and cross interface coherence.<n> Experimental results demonstrate that CogniGUI surpasses state-of-the-art methods in both the current GUI grounding benchmarks and our newly proposed benchmark.
arXiv Detail & Related papers (2025-06-22T06:30:52Z)
AccessFixer: Enhancing GUI Accessibility for Low Vision Users With R-GCN Model [32.47608503609055]
We propose a novel approach named AccessFixer to fix accessibility issues in Graphical User Interfaces (GUIs)<n>With AccessFixer, the fixed GUIs would have a consistent color palette, uniform intervals, and adequate size changes achieved through coordinated adjustments to the attributes of related components.<n>We apply AccessFixer to 10 open-source apps by submitting the fixed results with pull requests on GitHub.
arXiv Detail & Related papers (2025-02-21T01:52:51Z)
Are your apps accessible? A GCN-based accessibility checker for low vision users [22.747735521796077]
We propose a novel approach, named ALVIN, which represents the Graphical User Interface as a graph and adopts the Graph Convolutional Neural Networks (GCN) to label inaccessible components.<n>Experiments on 48 apps demonstrate the effectiveness of ALVIN, with precision of 83.5%, recall of 78.9%, and F1-score of 81.2%, outperforming baseline methods.
arXiv Detail & Related papers (2025-02-20T06:04:06Z)
GUI Agents: A Survey [129.94551809688377]
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction.<n>Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods.
arXiv Detail & Related papers (2024-12-18T04:48:28Z)
Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism.<n>In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation.<n>Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z)
Falcon-UI: Understanding GUI Before Following User Instructions [57.67308498231232]
We introduce an instruction-free GUI navigation dataset, termed Insight-UI dataset, to enhance model comprehension of GUI environments.<n> Insight-UI dataset is automatically generated from the Common Crawl corpus, simulating various platforms.<n>We develop the GUI agent model Falcon-UI, which is initially pretrained on Insight-UI dataset and subsequently fine-tuned on Android and Web GUI datasets.
arXiv Detail & Related papers (2024-12-12T15:29:36Z)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents.<n>Our approach leverages image-based observations, and grounding instructions in natural language to visual elements.<n>To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations. ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z)
Vision-Based Mobile App GUI Testing: A Survey [29.042723121518765]
Vision-based mobile app GUI testing approaches emerged with the development of computer vision technologies. We provide a comprehensive investigation of the state-of-the-art techniques on 271 papers, among which 92 are vision-based studies.
arXiv Detail & Related papers (2023-10-20T14:04:04Z)
Psychologically-Inspired, Unsupervised Inference of Perceptual Groups of GUI Widgets from GUI Images [21.498096538797952]
We present a novel unsupervised image-based method for inferring perceptual groups of GUI widgets. The evaluation on a dataset of 1,091 GUIs collected from 772 mobile apps and 20 UI design mockups shows that our method significantly outperforms the state-of-the-art ad-hocs-based baseline.
arXiv Detail & Related papers (2022-06-15T05:16:03Z)
Understanding Visual Saliency in Mobile User Interfaces [31.278845008743698]
We present findings from a controlled study with 30 participants and 193 mobile UIs. Results speak to a role of expectations in guiding where users look at. We release the first annotated dataset for investigating visual saliency in mobile UIs.
arXiv Detail & Related papers (2021-01-22T15:45:13Z)
User-Guided Domain Adaptation for Rapid Annotation from User Interactions: A Study on Pathological Liver Segmentation [49.96706092808873]
Mask-based annotation of medical images, especially for 3D data, is a bottleneck in developing reliable machine learning models. We propose the user-guided domain adaptation (UGDA) framework, which uses prediction-based adversarial domain adaptation (PADA) to model the combined distribution of UIs and mask predictions. We show UGDA can retain this state-of-the-art performance even when only seeing a fraction of available UIs.
arXiv Detail & Related papers (2020-09-05T04:24:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.