Related papers: Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms

Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms

URL: http://arxiv.org/abs/2307.07134v1
Date: Fri, 14 Jul 2023 03:15:56 GMT
Title: Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms
Authors: Qi Liu, Zheng Gong, Zhenya Huang, Chuanren Liu, Hengshu Zhu, Zhi Li, Enhong Chen and Hui Xiong
Abstract summary: We propose a task-agnostic evaluation framework Camilla for evaluating machine learning algorithms. We use cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills of each sample. In our experiments, Camilla outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
Score: 88.93372675846123
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning algorithms have become ubiquitous in a number of applications (e.g. image classification). However, due to the insufficient measurement of traditional metrics (e.g. the coarse-grained Accuracy of each classifier), substantial gaps are usually observed between the real-world performance of these algorithms and their scores in standardized evaluations. In this paper, inspired by the psychometric theories from human measurement, we propose a task-agnostic evaluation framework Camilla, where a multi-dimensional diagnostic metric Ability is defined for collaboratively measuring the multifaceted strength of each machine learning algorithm. Specifically, given the response logs from different algorithms to data samples, we leverage cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills (explicitly or implicitly pre-defined) of each sample. In this way, both the abilities of each algorithm on multiple skills and some of the sample factors (e.g. sample difficulty) can be simultaneously quantified. We conduct extensive experiments with hundreds of machine learning algorithms on four public datasets, and our experimental results demonstrate that Camilla not only can capture the pros and cons of each algorithm more precisely, but also outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.

Related papers

A General Online Algorithm for Optimizing Complex Performance Metrics [5.726378955570775]
We introduce and analyze a general online algorithm that can be used in a straightforward way with a variety of complex performance metrics in binary, multi-class, and multi-label classification problems. The algorithm's update and prediction rules are appealingly simple and computationally efficient without the need to store any past data.
arXiv Detail & Related papers (2024-06-20T21:24:47Z)
Matched Machine Learning: A Generalized Framework for Treatment Effect Inference With Learned Metrics [87.05961347040237]
We introduce Matched Machine Learning, a framework that combines the flexibility of machine learning black boxes with the interpretability of matching. Our framework uses machine learning to learn an optimal metric for matching units and estimating outcomes. We show empirically that instances of Matched Machine Learning perform on par with black-box machine learning methods and better than existing matching methods for similar problems.
arXiv Detail & Related papers (2023-04-03T19:32:30Z)
Encoding of data sets and algorithms [0.0]
In many high-impact applications, it is important to ensure the quality of output of a machine learning algorithm. We have initiated a mathematically rigorous theory to decide which models are close to each other in terms of certain metrics. A given threshold metric acting on this grid will express the nearness (or statistical distance) from each algorithm and data set of interest to any given application.
arXiv Detail & Related papers (2023-03-02T05:29:27Z)
Differential testing for machine learning: an analysis for classification algorithms beyond deep learning [7.081604594416339]
We conduct a case study using Scikit-learn, Weka, Spark MLlib, and Caret. We identify the potential of differential testing by considering which algorithms are available in multiple frameworks. The feasibility seems limited because often it is not possible to determine configurations that are the same in other frameworks.
arXiv Detail & Related papers (2022-07-25T08:27:01Z)
Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data. We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)
Estimating informativeness of samples with Smooth Unique Information [108.25192785062367]
We measure how much a sample informs the final weights and how much it informs the function computed by the weights. We give efficient approximations of these quantities using a linearized network. We apply these measures to several problems, such as dataset summarization.
arXiv Detail & Related papers (2021-01-17T10:29:29Z)
Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods. We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms. We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z)
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification. Our analysis reveals that the classification accuracy is highly distribution-dependent. The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.