AXNav: Replaying Accessibility Tests from Natural Language
- URL: http://arxiv.org/abs/2310.02424v3
- Date: Tue, 5 Mar 2024 01:28:25 GMT
- Title: AXNav: Replaying Accessibility Tests from Natural Language
- Authors: Maryam Taeb, Amanda Swearngin, Eldon Schoop, Ruijia Cheng, Yue Jiang,
Jeffrey Nichols
- Abstract summary: Large Language Models (LLMs) have been used for a variety of tasks including automation of UIs.
This paper explores the requirements of a natural language based accessibility testing workflow.
We build a system that takes as input a manual accessibility test (e.g., Search for a show in VoiceOver'') and uses an LLM combined with pixel-based UI Understanding models to execute the test.
- Score: 14.131076040673351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developers and quality assurance testers often rely on manual testing to test
accessibility features throughout the product lifecycle. Unfortunately, manual
testing can be tedious, often has an overwhelming scope, and can be difficult
to schedule amongst other development milestones. Recently, Large Language
Models (LLMs) have been used for a variety of tasks including automation of
UIs, however to our knowledge no one has yet explored their use in controlling
assistive technologies for the purposes of supporting accessibility testing. In
this paper, we explore the requirements of a natural language based
accessibility testing workflow, starting with a formative study. From this we
build a system that takes as input a manual accessibility test (e.g., ``Search
for a show in VoiceOver'') and uses an LLM combined with pixel-based UI
Understanding models to execute the test and produce a chaptered, navigable
video. In each video, to help QA testers we apply heuristics to detect and flag
accessibility issues (e.g., Text size not increasing with Large Text enabled,
VoiceOver navigation loops). We evaluate this system through a 10 participant
user study with accessibility QA professionals who indicated that the tool
would be very useful in their current work and performed tests similarly to how
they would manually test the features. The study also reveals insights for
future work on using LLMs for accessibility testing.
Related papers
- TESTEVAL: Benchmarking Large Language Models for Test Case Generation [15.343859279282848]
We propose TESTEVAL, a novel benchmark for test case generation with large language models (LLMs)
We collect 210 Python programs from an online programming platform, LeetCode, and design three different tasks: overall coverage, targeted line/branch coverage, and targeted path coverage.
We find that generating test cases to cover specific program lines/branches/paths is still challenging for current LLMs.
arXiv Detail & Related papers (2024-06-06T22:07:50Z) - Are We Testing or Being Tested? Exploring the Practical Applications of
Large Language Models in Software Testing [0.0]
A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content.
LLM can play a pivotal role in software development, including software testing.
This study explores the practical application of LLMs in software testing within an industrial setting.
arXiv Detail & Related papers (2023-12-08T06:30:37Z) - Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI
Testing via Functionality-aware Decisions [23.460051600514806]
GPTDroid is a Q&A-based GUI testing framework for mobile apps.
We introduce a functionality-aware memory prompting mechanism.
It outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate.
arXiv Detail & Related papers (2023-10-24T12:30:26Z) - LLM for Test Script Generation and Migration: Challenges, Capabilities,
and Opportunities [8.504639288314063]
Test script generation is a vital component of software testing, enabling efficient and reliable automation of repetitive test tasks.
Existing generation approaches often encounter limitations, such as difficulties in accurately capturing and reproducing test scripts across diverse devices, platforms, and applications.
This paper investigates the application of large language models (LLM) in the domain of mobile application test script generation.
arXiv Detail & Related papers (2023-09-24T07:58:57Z) - Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing
Perspective [63.92197404447808]
Large language models (LLMs) have shown some human-like cognitive abilities.
We propose an adaptive testing framework for LLM evaluation.
This approach dynamically adjusts the characteristics of the test questions, such as difficulty, based on the model's performance.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Towards Autonomous Testing Agents via Conversational Large Language
Models [18.302956037305112]
Large language models (LLMs) can be used as automated testing assistants.
We present a taxonomy of LLM-based testing agents based on their level of autonomy.
arXiv Detail & Related papers (2023-06-08T12:22:38Z) - BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models [73.29106813131818]
bias testing is currently cumbersome since the test sentences are generated from a limited set of manual templates or need expensive crowd-sourcing.
We propose using ChatGPT for the controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes.
We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing.
arXiv Detail & Related papers (2023-02-14T22:07:57Z) - Towards Informed Design and Validation Assistance in Computer Games
Using Imitation Learning [65.12226891589592]
This paper proposes a new approach to automated game validation and testing.
Our method leverages a data-driven imitation learning technique, which requires little effort and time and no knowledge of machine learning or programming.
arXiv Detail & Related papers (2022-08-15T11:08:44Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - UKP-SQUARE: An Online Platform for Question Answering Research [50.35348764297317]
We present UKP-SQUARE, an online QA platform for researchers which allows users to query and analyze a large collection of modern Skills.
UKP-SQUARE allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated tests.
arXiv Detail & Related papers (2022-03-25T15:00:24Z) - Look Before you Speak: Visually Contextualized Utterances [88.58909442073858]
We create a task for predicting utterances in a video using both visual frames and transcribed speech as context.
By exploiting the large number of instructional videos online, we train a model to solve this task at scale, without the need for manual annotations.
Our model achieves state-of-the-art performance on a number of downstream VideoQA benchmarks.
arXiv Detail & Related papers (2020-12-10T14:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.