Quality Assurance for Artificial Intelligence: A Study of Industrial
Concerns, Challenges and Best Practices
- URL: http://arxiv.org/abs/2402.16391v1
- Date: Mon, 26 Feb 2024 08:31:45 GMT
- Title: Quality Assurance for Artificial Intelligence: A Study of Industrial
Concerns, Challenges and Best Practices
- Authors: Chenyu Wang, Zhou Yang, Ze Shi Li, Daniela Damian, David Lo
- Abstract summary: We report on the challenges and best practices of quality assurance for AI systems (QA4AI)
Our findings suggest correctness as the most important property, followed by model relevance, efficiency and deployability.
We identify 21 QA4AI practices across each stage of AI development.
- Score: 14.222404866137756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quality Assurance (QA) aims to prevent mistakes and defects in manufactured
products and avoid problems when delivering products or services to customers.
QA for AI systems, however, poses particular challenges, given their
data-driven and non-deterministic nature as well as more complex architectures
and algorithms. While there is growing empirical evidence about practices of
machine learning in industrial contexts, little is known about the challenges
and best practices of quality assurance for AI systems (QA4AI). In this paper,
we report on a mixed-method study of QA4AI in industry practice from various
countries and companies. Through interviews with fifteen industry practitioners
and a validation survey with 50 practitioner responses, we studied the concerns
as well as challenges and best practices in ensuring the QA4AI properties
reported in the literature, such as correctness, fairness, interpretability and
others. Our findings suggest correctness as the most important property,
followed by model relevance, efficiency and deployability. In contrast,
transferability (applying knowledge learned in one task to another task),
security and fairness are not paid much attention by practitioners compared to
other properties. Challenges and solutions are identified for each QA4AI
property. For example, interviewees highlighted the trade-off challenge among
latency, cost and accuracy for efficiency (latency and cost are parts of
efficiency concern). Solutions like model compression are proposed. We
identified 21 QA4AI practices across each stage of AI development, with 10
practices being well recognized and another 8 practices being marginally agreed
by the survey practitioners.
Related papers
- Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation [2.2241228857601727]
This paper presents an interdisciplinary meta-review of about 100 studies that discuss shortcomings in quantitative benchmarking practices.
It brings together many fine-grained issues in the design and application of benchmarks with broader sociotechnical issues.
Our review also highlights a series of systemic flaws in current practices, such as misaligned incentives, construct validity issues, unknown unknowns, and problems with the gaming of benchmark results.
arXiv Detail & Related papers (2025-02-10T15:25:06Z) - An Empirical Study on Decision-Making Aspects in Responsible Software Engineering for AI [5.564793925574796]
This study investigates the ethical challenges and complexities inherent in responsible software engineering (RSE) for AI.
Personal values, emerging roles, and awareness of AIs societal impact influence responsible decision-making in RSE for AI.
arXiv Detail & Related papers (2025-01-26T22:38:04Z) - Bridging the Communication Gap: Evaluating AI Labeling Practices for Trustworthy AI Development [41.64451715899638]
High-level AI labels, inspired by frameworks like EU energy labels, have been proposed to make the properties of AI models more transparent.
This study evaluates AI labeling through qualitative interviews along four key research questions.
arXiv Detail & Related papers (2025-01-21T06:00:14Z) - Evaluation of OpenAI o1: Opportunities and Challenges of AGI [112.0812059747033]
o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance.
The model excelled in tasks requiring intricate reasoning and knowledge integration across various fields.
Overall results indicate significant progress towards artificial general intelligence.
arXiv Detail & Related papers (2024-09-27T06:57:00Z) - Comprehensive Overview of Artificial Intelligence Applications in Modern Industries [0.3374875022248866]
This paper explores the applications of AI across four key sectors: healthcare, finance, manufacturing, and retail.
We discuss the implications of AI integration, including ethical considerations, the future trajectory of AI development, and its potential to drive economic growth.
arXiv Detail & Related papers (2024-09-19T19:22:52Z) - Trustworthy and Responsible AI for Human-Centric Autonomous Decision-Making Systems [2.444630714797783]
We review and discuss the intricacies of AI biases, definitions, methods of detection and mitigation, and metrics for evaluating bias.
We also discuss open challenges with regard to the trustworthiness and widespread application of AI across diverse domains of human-centric decision making.
arXiv Detail & Related papers (2024-08-28T06:04:25Z) - OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI [73.75520820608232]
We introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities.
These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage.
Our evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration.
arXiv Detail & Related papers (2024-06-18T16:20:53Z) - Artificial Intelligence in Industry 4.0: A Review of Integration Challenges for Industrial Systems [45.31340537171788]
Cyber-Physical Systems (CPS) generate vast data sets that can be leveraged by Artificial Intelligence (AI) for applications including predictive maintenance and production planning.
Despite the demonstrated potential of AI, its widespread adoption in sectors like manufacturing remains limited.
arXiv Detail & Related papers (2024-05-28T20:54:41Z) - Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs)
It examines the challenges introduced by AI components and the impact on testing procedures.
The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z) - Competition-Level Problems are Effective LLM Evaluators [121.15880285283116]
This paper aims to evaluate the reasoning capacities of large language models (LLMs) in solving recent programming problems in Codeforces.
We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered.
Surprisingly, theThoughtived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems.
arXiv Detail & Related papers (2023-12-04T18:58:57Z) - Towards Implementing Responsible AI [22.514717870367623]
We propose four aspects of AI system design and development, adapting processes used in software engineering.
The salient findings cover four aspects of AI system design and development, adapting processes used in software engineering.
arXiv Detail & Related papers (2022-05-09T14:59:23Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z) - Quality Management of Machine Learning Systems [0.0]
Artificial Intelligence (AI) has become a part of our daily lives due to major advances in Machine Learning (ML) techniques.
For business/mission-critical systems, serious concerns about reliability and maintainability of AI applications remain.
This paper presents a view of a holistic quality management framework for ML applications based on the current advances.
arXiv Detail & Related papers (2020-06-16T21:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.