Related papers: AXNav: Replaying Accessibility Tests from Natural Language

AXNav: Replaying Accessibility Tests from Natural Language

URL: http://arxiv.org/abs/2310.02424v3
Date: Tue, 5 Mar 2024 01:28:25 GMT
Title: AXNav: Replaying Accessibility Tests from Natural Language
Authors: Maryam Taeb, Amanda Swearngin, Eldon Schoop, Ruijia Cheng, Yue Jiang, Jeffrey Nichols
Abstract summary: Large Language Models (LLMs) have been used for a variety of tasks including automation of UIs. This paper explores the requirements of a natural language based accessibility testing workflow. We build a system that takes as input a manual accessibility test (e.g., Search for a show in VoiceOver'') and uses an LLM combined with pixel-based UI Understanding models to execute the test.
Score: 14.131076040673351
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Developers and quality assurance testers often rely on manual testing to test accessibility features throughout the product lifecycle. Unfortunately, manual testing can be tedious, often has an overwhelming scope, and can be difficult to schedule amongst other development milestones. Recently, Large Language Models (LLMs) have been used for a variety of tasks including automation of UIs, however to our knowledge no one has yet explored their use in controlling assistive technologies for the purposes of supporting accessibility testing. In this paper, we explore the requirements of a natural language based accessibility testing workflow, starting with a formative study. From this we build a system that takes as input a manual accessibility test (e.g., ``Search for a show in VoiceOver'') and uses an LLM combined with pixel-based UI Understanding models to execute the test and produce a chaptered, navigable video. In each video, to help QA testers we apply heuristics to detect and flag accessibility issues (e.g., Text size not increasing with Large Text enabled, VoiceOver navigation loops). We evaluate this system through a 10 participant user study with accessibility QA professionals who indicated that the tool would be very useful in their current work and performed tests similarly to how they would manually test the features. The study also reveals insights for future work on using LLMs for accessibility testing.

Related papers

Automated Soap Opera Testing Directed by LLMs and Scenario Knowledge: Feasibility, Challenges, and Road Ahead [43.15092098658384]
Exploratory testing (ET) harnesses tester's knowledge, creativity, and experience to create varying tests that uncover unexpected bugs from the end-user's perspective. We explore the feasibility, challenges and road ahead of automated scenario-based ET (a.k.a soap opera testing)
arXiv Detail & Related papers (2024-12-11T17:57:23Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
Multi-language Unit Test Generation using LLMs [6.259245181881262]
We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases. We show how the pipeline can be applied to different programming languages, specifically Java and Python, and to complex software requiring environment mocking. Our results demonstrate that LLM-based test generation, when guided by static analysis, can be competitive with, and even outperform, state-of-the-art test-generation techniques in coverage achieved.
arXiv Detail & Related papers (2024-09-04T21:46:18Z)
Learning to Ask: When LLMs Meet Unclear Instruction [49.256630152684764]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench. We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z)
Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training [55.321010757641524]
A major public concern regarding the training of large language models (LLMs) is whether they abusing copyrighted online text. Previous membership inference methods may be misled by similar examples in vast amounts of training data. We propose an alternative textitinsert-and-detection methodology, advocating that web users and content platforms employ textbftextitunique identifiers.
arXiv Detail & Related papers (2024-03-23T06:36:32Z)
Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques. We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z)
Are We Testing or Being Tested? Exploring the Practical Applications of Large Language Models in Software Testing [0.0]
A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content. LLM can play a pivotal role in software development, including software testing. This study explores the practical application of LLMs in software testing within an industrial setting.
arXiv Detail & Related papers (2023-12-08T06:30:37Z)
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions [23.460051600514806]
GPTDroid is a Q&A-based GUI testing framework for mobile apps. We introduce a functionality-aware memory prompting mechanism. It outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate.
arXiv Detail & Related papers (2023-10-24T12:30:26Z)
LLM for Test Script Generation and Migration: Challenges, Capabilities, and Opportunities [8.504639288314063]
Test script generation is a vital component of software testing, enabling efficient and reliable automation of repetitive test tasks. Existing generation approaches often encounter limitations, such as difficulties in accurately capturing and reproducing test scripts across diverse devices, platforms, and applications. This paper investigates the application of large language models (LLM) in the domain of mobile application test script generation.
arXiv Detail & Related papers (2023-09-24T07:58:57Z)
Towards Autonomous Testing Agents via Conversational Large Language Models [18.302956037305112]
Large language models (LLMs) can be used as automated testing assistants. We present a taxonomy of LLM-based testing agents based on their level of autonomy.
arXiv Detail & Related papers (2023-06-08T12:22:38Z)
BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models [73.29106813131818]
bias testing is currently cumbersome since the test sentences are generated from a limited set of manual templates or need expensive crowd-sourcing. We propose using ChatGPT for the controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes. We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing.
arXiv Detail & Related papers (2023-02-14T22:07:57Z)
Towards Informed Design and Validation Assistance in Computer Games Using Imitation Learning [65.12226891589592]
This paper proposes a new approach to automated game validation and testing. Our method leverages a data-driven imitation learning technique, which requires little effort and time and no knowledge of machine learning or programming.
arXiv Detail & Related papers (2022-08-15T11:08:44Z)
UKP-SQUARE: An Online Platform for Question Answering Research [50.35348764297317]
We present UKP-SQUARE, an online QA platform for researchers which allows users to query and analyze a large collection of modern Skills. UKP-SQUARE allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated tests.
arXiv Detail & Related papers (2022-03-25T15:00:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.