Related papers: Will AI also replace inspectors? Investigating the potential of generative AIs in usability inspection

Will AI also replace inspectors? Investigating the potential of generative AIs in usability inspection

URL: http://arxiv.org/abs/2510.17056v1
Date: Sun, 19 Oct 2025 23:59:15 GMT
Title: Will AI also replace inspectors? Investigating the potential of generative AIs in usability inspection
Authors: Luis F. G. Campos, Leonardo C. Marques, Walter T. Nakamura,
Abstract summary: This study examines the performance of generative AIs in identifying usability problems, comparing them to those of experienced human inspectors.<n>While inspectors achieved the highest levels of precision and overall coverage, the AIs demonstrated high individual performance and discovered many novel defects, but with a higher rate of false positives and redundant reports.<n>These findings suggest that AI, in its current stage, cannot replace human inspectors but can serve as a valuable augmentation tool to improve efficiency and expand defect coverage.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Usability inspection is a well-established technique for identifying interaction issues in software interfaces, thereby contributing to improved product quality. However, it is a costly process that requires time and specialized knowledge from inspectors. With advances in Artificial Intelligence (AI), new opportunities have emerged to support this task, particularly through generative models capable of interpreting interfaces and performing inspections more efficiently. This study examines the performance of generative AIs in identifying usability problems, comparing them to those of experienced human inspectors. A software prototype was evaluated by four specialists and two AI models (GPT-4o and Gemini 2.5 Flash), using metrics such as precision, recall, and F1-score. While inspectors achieved the highest levels of precision and overall coverage, the AIs demonstrated high individual performance and discovered many novel defects, but with a higher rate of false positives and redundant reports. The combination of AIs and human inspectors produced the best results, revealing their complementarity. These findings suggest that AI, in its current stage, cannot replace human inspectors but can serve as a valuable augmentation tool to improve efficiency and expand defect coverage. The results provide evidence based on quantitative analysis to inform the discussion on the role of AI in usability inspections, pointing to viable paths for its complementary use in software quality assessment contexts.

Related papers

Generative AI in Software Testing: Current Trends and Future Directions [1.0312968200748118]
This paper investigates current software testing systems and explores how artificial intelligence, specifically Generative AI, can be integrated to enhance these systems.<n>It focuses on the potential of Generative AI to transform software testing processes by improving test coverage, increasing efficiency, and reducing costs.
arXiv Detail & Related papers (2026-03-02T18:01:43Z)
How Students Use Generative AI for Software Testing: An Observational Study [3.2402950370430497]
This study investigates how novice software developers interact with generative AI for engineering unit tests.<n>We identified four interaction strategies, defined by whether the test idea or the test implementation originated from generative AI or the participant.<n>Students reported benefits including time-saving, reduced cognitive load, and support for test ideation, but also noted drawbacks such as diminished trust, test quality concerns, and lack of ownership.
arXiv Detail & Related papers (2025-10-12T11:31:41Z)
Explainable AI for Collaborative Assessment of 2D/3D Registration Quality [50.65650507103078]
We propose the first artificial intelligence framework trained specifically for 2D/3D registration quality verification.<n>Our explainable AI (XAI) approach aims to enhance informed decision-making for human operators.
arXiv Detail & Related papers (2025-07-23T15:28:57Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
General Scales Unlock AI Evaluation with Explanatory and Predictive Power [57.7995945974989]
benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems.<n>We introduce general scales for AI evaluation that can explain what common AI benchmarks really measure.<n>Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate.
arXiv Detail & Related papers (2025-03-09T01:13:56Z)
Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z)
Human-AI Collaborative Game Testing with Vision Language Models [0.0]
This study investigates how AI can improve game testing by developing and experimenting with an AI-assisted workflow.<n>We evaluate the effectiveness of AI assistance under four conditions: with or without AI support, and with or without detailed knowledge of defects and design documentation.<n>Results indicate that AI assistance significantly improves defect identification performance, particularly when paired with detailed knowledge.
arXiv Detail & Related papers (2025-01-20T23:14:23Z)
To Err Is AI! Debugging as an Intervention to Facilitate Appropriate Reliance on AI Systems [11.690126756498223]
Vision for optimal human-AI collaboration requires 'appropriate reliance' of humans on AI systems. In practice, the performance disparity of machine learning models on out-of-distribution data makes dataset-specific performance feedback unreliable.
arXiv Detail & Related papers (2024-09-22T09:43:27Z)
AI-powered software testing tools: A systematic review and empirical assessment of their features and limitations [1.0344642971058589]
AI-driven test automation tools show strong potential in improving software quality and reducing manual testing effort.<n>Future research should focus on advancing AI models to improve adaptability, reliability, and robustness in software testing.
arXiv Detail & Related papers (2024-08-31T10:10:45Z)
The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed. The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z)
Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making [53.62514158534574]
We study whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI. We show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making.
arXiv Detail & Related papers (2020-01-07T15:33:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.