Evaluative Item-Contrastive Explanations in Rankings
- URL: http://arxiv.org/abs/2312.10094v1
- Date: Thu, 14 Dec 2023 15:40:51 GMT
- Title: Evaluative Item-Contrastive Explanations in Rankings
- Authors: Alessandro Castelnovo, Riccardo Crupi, Nicol\`o Mombelli, Gabriele
Nanino, Daniele Regoli
- Abstract summary: This paper advocates for the application of a specific form of Explainable AI -- namely, contrastive explanations -- as well-suited for addressing ranking problems.
The present work introduces Evaluative Item-Contrastive Explanations tailored for ranking systems and illustrates its application and characteristics through an experiment conducted on publicly available data.
- Score: 47.24529321119513
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The remarkable success of Artificial Intelligence in advancing automated
decision-making is evident both in academia and industry. Within the plethora
of applications, ranking systems hold significant importance in various
domains. This paper advocates for the application of a specific form of
Explainable AI -- namely, contrastive explanations -- as particularly
well-suited for addressing ranking problems. This approach is especially potent
when combined with an Evaluative AI methodology, which conscientiously
evaluates both positive and negative aspects influencing a potential ranking.
Therefore, the present work introduces Evaluative Item-Contrastive Explanations
tailored for ranking systems and illustrates its application and
characteristics through an experiment conducted on publicly available data.
Related papers
- AERA Chat: An Interactive Platform for Automated Explainable Student Answer Assessment [12.970776782360366]
AERA Chat is an interactive platform to provide visually explained assessment of student answers.
Users can input questions and student answers to obtain automated, explainable assessment results from large language models.
arXiv Detail & Related papers (2024-10-12T11:57:53Z) - Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy [66.95501113584541]
Utility and topical relevance are critical measures in information retrieval.
We propose an Iterative utiliTy judgmEnt fraMework to promote each step of the cycle of Retrieval-Augmented Generation.
arXiv Detail & Related papers (2024-06-17T07:52:42Z) - Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models.
Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems.
To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z) - Hierarchical Evaluation Framework: Best Practices for Human Evaluation [17.91641890651225]
The absence of widely accepted human evaluation metrics in NLP hampers fair comparisons among different systems and the establishment of universal assessment standards.
We develop our own hierarchical evaluation framework to provide a more comprehensive representation of the NLP system's performance.
In future work, we will investigate the potential time-saving benefits of our proposed framework for evaluators assessing NLP systems.
arXiv Detail & Related papers (2023-10-03T09:46:02Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Vote'n'Rank: Revision of Benchmarking with Social Choice Theory [7.224599819499157]
This paper proposes Vote'n'Rank, a framework for ranking systems in multi-task benchmarks under the principles of the social choice theory.
We demonstrate that our approach can be efficiently utilised to draw new insights on benchmarking in several ML sub-fields.
arXiv Detail & Related papers (2022-10-11T20:19:11Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers.
To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed.
There are no standard procedure for using this ranking information and Area Chairs may use it in different ways.
We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z) - From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic
Review on Evaluating Explainable AI [3.7592122147132776]
We identify 12 conceptual properties, such as Compactness and Correctness, that should be evaluated for comprehensively assessing the quality of an explanation.
We find that 1 in 3 papers evaluate exclusively with anecdotal evidence, and 1 in 5 papers evaluate with users.
This systematic collection of evaluation methods provides researchers and practitioners with concrete tools to thoroughly validate, benchmark and compare new and existing XAI methods.
arXiv Detail & Related papers (2022-01-20T13:23:20Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.