Multi-Criteria Comparison as a Method of Advancing Knowledge-Guided Machine Learning
- URL: http://arxiv.org/abs/2403.11840v1
- Date: Mon, 18 Mar 2024 14:50:48 GMT
- Title: Multi-Criteria Comparison as a Method of Advancing Knowledge-Guided Machine Learning
- Authors: Jason L. Harman, Jaelle Scheuerman,
- Abstract summary: This paper describes a generalizable model evaluation method that can be adapted to evaluate AI/ML models.
The method evaluates a group of candidate models of varying type and structure across multiple scientific, theoretic, and practical criteria.
- Score: 1.6574413179773761
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper describes a generalizable model evaluation method that can be adapted to evaluate AI/ML models across multiple criteria including core scientific principles and more practical outcomes. Emerging from prediction competitions in Psychology and Decision Science, the method evaluates a group of candidate models of varying type and structure across multiple scientific, theoretic, and practical criteria. Ordinal ranking of criteria scores are evaluated using voting rules from the field of computational social choice and allow the comparison of divergent measures and types of models in a holistic evaluation. Additional advantages and applications are discussed.
Related papers
- Evaluatology: The Science and Engineering of Evaluation [11.997673313601423]
This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation.
We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines.
arXiv Detail & Related papers (2024-03-19T13:38:26Z) - FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets [69.91340332545094]
We introduce FLASK, a fine-grained evaluation protocol for both human-based and model-based evaluation.
We experimentally observe that the fine-graininess of evaluation is crucial for attaining a holistic view of model performance.
arXiv Detail & Related papers (2023-07-20T14:56:35Z) - UMSE: Unified Multi-scenario Summarization Evaluation [52.60867881867428]
Summarization quality evaluation is a non-trivial task in text summarization.
We propose Unified Multi-scenario Summarization Evaluation Model (UMSE)
Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios.
arXiv Detail & Related papers (2023-05-26T12:54:44Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Selection of a representative sorting model in a preference
disaggregation setting: a review of existing procedures, new proposals, and
experimental comparison [4.447467536572626]
We consider preference disaggregation in the context of multiple criteria sorting.
Given the multiplicity of sorting models compatible with indirect preferences, selecting a single, representative one can be conducted differently.
We present three novel procedures that implement the robust assignment rule in practice.
arXiv Detail & Related papers (2022-08-30T02:01:35Z) - fairlib: A Unified Framework for Assessing and Improving Classification
Fairness [66.27822109651757]
fairlib is an open-source framework for assessing and improving classification fairness.
We implement 14 debiasing methods, including pre-processing, at-training-time, and post-processing approaches.
The built-in metrics cover the most commonly used fairness criterion and can be further generalized and customized for fairness evaluation.
arXiv Detail & Related papers (2022-05-04T03:50:23Z) - A Meta Survey of Quality Evaluation Criteria in Explanation Methods [0.5801044612920815]
Explanation methods and their evaluation have become a significant issue in explainable artificial intelligence (XAI)
Since the most accurate AI models are opaque with low transparency and comprehensibility, explanations are essential for bias detection and control of uncertainty.
There are a plethora of criteria to choose from when evaluating explanation method quality.
arXiv Detail & Related papers (2022-03-25T22:24:21Z) - Image Quality Assessment in the Modern Age [53.19271326110551]
This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA)
We will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli.
Both hand-engineered and (deep) learning-based methods will be covered.
arXiv Detail & Related papers (2021-10-19T02:38:46Z) - Building an Ensemble of Classifiers via Randomized Models of Ensemble
Members [1.827510863075184]
In this paper, a novel randomized model of base classifier is developed.
In the proposed method, the random operation of the model results from a random selection of the learning set from the family of learning sets of a fixed size.
The DES scheme with the proposed model of competence was experimentally evaluated on the collection of 67 benchmark datasets.
arXiv Detail & Related papers (2021-09-16T10:53:13Z) - An ensemble learning framework based on group decision making [7.906702226082627]
A framework for the ensemble learning (EL) method based on group decision making (GDM) has been proposed to resolve this issue.
In this framework, base learners can be considered as decision-makers, different categories can be seen as alternatives, and the precision, recall, and accuracy which can reflect the performances of the classification methods can be employed.
arXiv Detail & Related papers (2020-07-01T13:18:34Z) - Marginal likelihood computation for model selection and hypothesis
testing: an extensive review [66.37504201165159]
This article provides a comprehensive study of the state-of-the-art of the topic.
We highlight limitations, benefits, connections and differences among the different techniques.
Problems and possible solutions with the use of improper priors are also described.
arXiv Detail & Related papers (2020-05-17T18:31:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.