AXNav: Replaying Accessibility Tests from Natural Language
- URL: http://arxiv.org/abs/2310.02424v3
- Date: Tue, 5 Mar 2024 01:28:25 GMT
- Title: AXNav: Replaying Accessibility Tests from Natural Language
- Authors: Maryam Taeb, Amanda Swearngin, Eldon Schoop, Ruijia Cheng, Yue Jiang,
Jeffrey Nichols
- Abstract summary: Large Language Models (LLMs) have been used for a variety of tasks including automation of UIs.
This paper explores the requirements of a natural language based accessibility testing workflow.
We build a system that takes as input a manual accessibility test (e.g., Search for a show in VoiceOver'') and uses an LLM combined with pixel-based UI Understanding models to execute the test.
- Score: 14.131076040673351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developers and quality assurance testers often rely on manual testing to test
accessibility features throughout the product lifecycle. Unfortunately, manual
testing can be tedious, often has an overwhelming scope, and can be difficult
to schedule amongst other development milestones. Recently, Large Language
Models (LLMs) have been used for a variety of tasks including automation of
UIs, however to our knowledge no one has yet explored their use in controlling
assistive technologies for the purposes of supporting accessibility testing. In
this paper, we explore the requirements of a natural language based
accessibility testing workflow, starting with a formative study. From this we
build a system that takes as input a manual accessibility test (e.g., ``Search
for a show in VoiceOver'') and uses an LLM combined with pixel-based UI
Understanding models to execute the test and produce a chaptered, navigable
video. In each video, to help QA testers we apply heuristics to detect and flag
accessibility issues (e.g., Text size not increasing with Large Text enabled,
VoiceOver navigation loops). We evaluate this system through a 10 participant
user study with accessibility QA professionals who indicated that the tool
would be very useful in their current work and performed tests similarly to how
they would manually test the features. The study also reveals insights for
future work on using LLMs for accessibility testing.
Related papers
- Automated Soap Opera Testing Directed by LLMs and Scenario Knowledge: Feasibility, Challenges, and Road Ahead [43.15092098658384]
Exploratory testing (ET) harnesses tester's knowledge, creativity, and experience to create varying tests that uncover unexpected bugs from the end-user's perspective.
We explore the feasibility, challenges and road ahead of automated scenario-based ET (a.k.a soap opera testing)
arXiv Detail & Related papers (2024-12-11T17:57:23Z) - Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.
Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.
Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z) - ASTER: Natural and Multi-language Unit Test Generation with LLMs [6.259245181881262]
We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases.
We conduct an empirical study to assess the quality of the generated tests in terms of code coverage and test naturalness.
arXiv Detail & Related papers (2024-09-04T21:46:18Z) - Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training [55.321010757641524]
A major public concern regarding the training of large language models (LLMs) is whether they abusing copyrighted online text.
Previous membership inference methods may be misled by similar examples in vast amounts of training data.
We propose an alternative textitinsert-and-detection methodology, advocating that web users and content platforms employ textbftextitunique identifiers.
arXiv Detail & Related papers (2024-03-23T06:36:32Z) - Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation.
We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques.
We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z) - Are We Testing or Being Tested? Exploring the Practical Applications of
Large Language Models in Software Testing [0.0]
A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content.
LLM can play a pivotal role in software development, including software testing.
This study explores the practical application of LLMs in software testing within an industrial setting.
arXiv Detail & Related papers (2023-12-08T06:30:37Z) - LLM for Test Script Generation and Migration: Challenges, Capabilities,
and Opportunities [8.504639288314063]
Test script generation is a vital component of software testing, enabling efficient and reliable automation of repetitive test tasks.
Existing generation approaches often encounter limitations, such as difficulties in accurately capturing and reproducing test scripts across diverse devices, platforms, and applications.
This paper investigates the application of large language models (LLM) in the domain of mobile application test script generation.
arXiv Detail & Related papers (2023-09-24T07:58:57Z) - Towards Autonomous Testing Agents via Conversational Large Language
Models [18.302956037305112]
Large language models (LLMs) can be used as automated testing assistants.
We present a taxonomy of LLM-based testing agents based on their level of autonomy.
arXiv Detail & Related papers (2023-06-08T12:22:38Z) - UKP-SQUARE: An Online Platform for Question Answering Research [50.35348764297317]
We present UKP-SQUARE, an online QA platform for researchers which allows users to query and analyze a large collection of modern Skills.
UKP-SQUARE allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated tests.
arXiv Detail & Related papers (2022-03-25T15:00:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.