Related papers: Are your apps accessible? A GCN-based accessibility checker for low vision users

Are your apps accessible? A GCN-based accessibility checker for low vision users

URL: http://arxiv.org/abs/2502.14288v1
Date: Thu, 20 Feb 2025 06:04:06 GMT
Title: Are your apps accessible? A GCN-based accessibility checker for low vision users
Authors: Mengxi Zhang, Huaxiao Liu, Shenning Song, Chunyang Chen, Pei Huang, Jian Zhao,
Abstract summary: We propose a novel approach, named ALVIN, which represents the Graphical User Interface as a graph and adopts the Graph Convolutional Neural Networks (GCN) to label inaccessible components.<n>Experiments on 48 apps demonstrate the effectiveness of ALVIN, with precision of 83.5%, recall of 78.9%, and F1-score of 81.2%, outperforming baseline methods.
Score: 22.747735521796077
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Context: Accessibility issues (e.g., small size and narrow interval) in mobile applications (apps) lead to obstacles for billions of low vision users in interacting with Graphical User Interfaces (GUIs). Although GUI accessibility scanning tools exist, most of them perform rule-based check relying on complex GUI hierarchies. This might make them detect invisible redundant information, cannot handle small deviations, omit similar components, and is hard to extend. Objective: In this paper, we propose a novel approach, named ALVIN (Accessibility Checker for Low Vision), which represents the GUI as a graph and adopts the Graph Convolutional Neural Networks (GCN) to label inaccessible components. Method: ALVIN removes invisible views to prevent detecting redundancy and uses annotations from low vision users to handle small deviations. Also, the GCN model could consider the relations between GUI components, connecting similar components and reducing the possibility of omission. ALVIN only requires users to annotate the relevant dataset when detecting new kinds of issues. Results: Our experiments on 48 apps demonstrate the effectiveness of ALVIN, with precision of 83.5%, recall of 78.9%, and F1-score of 81.2%, outperforming baseline methods. In RQ2, the usefulness is verified through 20 issues submitted to open-source apps. The RQ3 also illustrates the GCN model is better than other models. Conclusion: To summarize, our proposed approach can effectively detect accessibility issues in GUIs for low vision users, thereby guiding developers in fixing them efficiently.

Related papers

AccessFixer: Enhancing GUI Accessibility for Low Vision Users With R-GCN Model [32.47608503609055]
We propose a novel approach named AccessFixer to fix accessibility issues in Graphical User Interfaces (GUIs) With AccessFixer, the fixed GUIs would have a consistent color palette, uniform intervals, and adequate size changes achieved through coordinated adjustments to the attributes of related components. We apply AccessFixer to 10 open-source apps by submitting the fixed results with pull requests on GitHub.
arXiv Detail & Related papers (2025-02-21T01:52:51Z)
Don't Confuse! Redrawing GUI Navigation Flow in Mobile Apps for Visually Impaired Users [22.747735521796077]
It remains unclear if visually impaired users, who rely solely on the screen readers to navigate and access app information, can do so in the correct and reasonable order. Considering these issues, we proposed a method named RGNF (Re-draw GUI Navigation Flow) It aimed to enhance the understandability and coherence of accessing the content of each component within the Graphical User Interface (GUI)
arXiv Detail & Related papers (2025-02-21T01:33:04Z)
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration [56.58744345634623]
We propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration.<n>We also introduce NovelScreenSpot, a benchmark for testing how well the data can help align GUI action grounding models to novel environments.
arXiv Detail & Related papers (2025-01-23T18:16:21Z)
Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation [53.1000575179389]
We propose a Retrieval-Augmented GUI Generation (RAGG) approach, integrated with an LLM-based GUI retrieval re-ranking and filtering mechanism.<n>In addition, we adapt Prompt Decomposition (PDGG) and Self-Critique (SCGG) for GUI generation.<n>Our evaluation, which encompasses over 3,000 GUI annotations from over 100 crowd-workers with UI/UX experience, shows that SCGG, in contrast to PDGG and RAGG, can lead to more effective GUI generation.
arXiv Detail & Related papers (2024-12-15T22:17:30Z)
Falcon-UI: Understanding GUI Before Following User Instructions [57.67308498231232]
We introduce an instruction-free GUI navigation dataset, termed Insight-UI dataset, to enhance model comprehension of GUI environments.<n> Insight-UI dataset is automatically generated from the Common Crawl corpus, simulating various platforms.<n>We develop the GUI agent model Falcon-UI, which is initially pretrained on Insight-UI dataset and subsequently fine-tuned on Android and Web GUI datasets.
arXiv Detail & Related papers (2024-12-12T15:29:36Z)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations. ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z)
Seeing is Believing: Vision-driven Non-crash Functional Bug Detection for Mobile Apps [26.96558418166514]
This paper proposes a novel vision-driven, multi-agent collaborative automated GUI testing approach for detecting non-crash functional bugs.<n>We evaluate Trident on 590 non-crash bugs and compare it with 12 baselines, it can achieve more than 14%-112% and 108%-147% boost in average recall and precision.
arXiv Detail & Related papers (2024-07-03T11:58:09Z)
GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents. First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs. Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z)
Graph4GUI: Graph Neural Networks for Representing Graphical User Interfaces [27.84098739594353]
Graph4GUI exploits graph neural networks to capture individual elements' properties and semantic-visuo-spatial constraints in a layout. The learned representation demonstrated its effectiveness in multiple tasks, especially generating designs in a challenging GUI autocompletion task.
arXiv Detail & Related papers (2024-04-21T04:06:09Z)
Sheaf4Rec: Sheaf Neural Networks for Graph-based Recommender Systems [18.596875449579688]
We propose a cutting-edge model inspired by category theory: Sheaf4Rec. Unlike single vector representations, Sheaf Neural Networks and their corresponding Laplacians represent each node (and edge) using a vector space. Our proposed model exhibits a noteworthy relative improvement of up to 8.53% on F1-Score@10 and an impressive increase of up to 11.29% on NDCG@10.
arXiv Detail & Related papers (2023-04-07T07:03:54Z)
Graph Neural Networks for Inconsistent Cluster Detection in Incremental Entity Resolution [3.4806267677524896]
In mature data repositories, the relationships may be mostly correct but require incremental improvements owing to errors in the original data or in the entity resolution system. This paper proposes a novel method for identifying inconsistent clusters (IC), existing groups of related products that do not belong together. We demonstrate that existing Message Passing neural networks perform well at this task, exceeding traditional graph processing techniques.
arXiv Detail & Related papers (2021-05-12T20:39:22Z)
ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images. We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs. Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.