How to Evaluate Explainability? -- A Case for Three Criteria
- URL: http://arxiv.org/abs/2209.00366v1
- Date: Thu, 1 Sep 2022 11:22:50 GMT
- Title: How to Evaluate Explainability? -- A Case for Three Criteria
- Authors: Timo Speith
- Abstract summary: We will provide a multidisciplinary motivation for three quality criteria concerning the information that systems should provide.
Our aim is to fuel the discussion regarding these criteria, such as adequate evaluation methods for them will be conceived.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The increasing complexity of software systems and the influence of
software-supported decisions in our society have sparked the need for software
that is safe, reliable, and fair. Explainability has been identified as a means
to achieve these qualities. It is recognized as an emerging non-functional
requirement (NFR) that has a significant impact on system quality. However, in
order to develop explainable systems, we need to understand when a system
satisfies this NFR. To this end, appropriate evaluation methods are required.
However, the field is crowded with evaluation methods, and there is no
consensus on which are the "right" ones. Much less, there is not even agreement
on which criteria should be evaluated. In this vision paper, we will provide a
multidisciplinary motivation for three such quality criteria concerning the
information that systems should provide: comprehensibility, fidelity, and
assessability. Our aim is to to fuel the discussion regarding these criteria,
such that adequate evaluation methods for them will be conceived.
Related papers
- Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - Functional trustworthiness of AI systems by statistically valid testing [7.717286312400472]
The authors are concerned about the safety, health, and rights of the European citizens due to inadequate measures and procedures required by the current draft of the EU Artificial Intelligence (AI) Act.
We observe that not only the current draft of the EU AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have resorted to the position that real functional guarantees of AI systems supposedly would be unrealistic and too complex anyways.
arXiv Detail & Related papers (2023-10-04T11:07:52Z) - A New Perspective on Evaluation Methods for Explainable Artificial
Intelligence (XAI) [0.0]
We argue that it is best approached in a nuanced way that incorporates resource availability, domain characteristics, and considerations of risk.
This work aims to advance the field of Requirements Engineering for AI.
arXiv Detail & Related papers (2023-07-26T15:15:44Z) - Revisiting the Performance-Explainability Trade-Off in Explainable
Artificial Intelligence (XAI) [0.0]
We argue that it is best approached in a nuanced way that incorporates resource availability, domain characteristics, and considerations of risk.
This work aims to advance the field of Requirements Engineering for AI.
arXiv Detail & Related papers (2023-07-26T15:07:40Z) - Towards Clear Expectations for Uncertainty Estimation [64.20262246029286]
Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine Learning (ML)
Most UQ methods suffer from disparate and inconsistent evaluation protocols.
This opinion paper offers a new perspective by specifying those requirements through five downstream tasks.
arXiv Detail & Related papers (2022-07-27T07:50:57Z) - Towards a multi-stakeholder value-based assessment framework for
algorithmic systems [76.79703106646967]
We develop a value-based assessment framework that visualizes closeness and tensions between values.
We give guidelines on how to operationalize them, while opening up the evaluation and deliberation process to a wide range of stakeholders.
arXiv Detail & Related papers (2022-05-09T19:28:32Z) - Tailored Uncertainty Estimation for Deep Learning Systems [10.288326973530614]
We propose a framework that guides the selection of a suitable uncertainty estimation method.
Our framework provides strategies to validate this choice and to uncover structural weaknesses.
It anticipates prospective machine learning regulations that require evidences for the technical appropriateness of machine learning systems.
arXiv Detail & Related papers (2022-04-29T09:23:07Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z) - How Trustworthy are Performance Evaluations for Basic Vision Tasks? [46.0590176230731]
This paper examines performance evaluation criteria for basic vision tasks involving sets of objects namely, object detection, instance-level segmentation and multi-object tracking.
The rankings of algorithms by an existing criterion can fluctuate with different choices of parameters, making their evaluations unreliable.
This work suggests a notion of trustworthiness for performance criteria, which requires (i) robustness to parameters for reliability, (ii) contextual meaningfulness in sanity tests, and (iii) consistency with mathematical requirements such as the metric properties.
arXiv Detail & Related papers (2020-08-08T14:21:15Z) - Towards Faithfully Interpretable NLP Systems: How should we define and
evaluate faithfulness? [58.13152510843004]
With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems.
What is interpretability, and what constitutes a high-quality interpretation?
We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria.
arXiv Detail & Related papers (2020-04-07T20:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.