Machine Learning Capability: A standardized metric using case difficulty
with applications to individualized deployment of supervised machine learning
- URL: http://arxiv.org/abs/2302.04386v1
- Date: Thu, 9 Feb 2023 00:38:42 GMT
- Title: Machine Learning Capability: A standardized metric using case difficulty
with applications to individualized deployment of supervised machine learning
- Authors: Adrienne Kline and Joon Lee
- Abstract summary: Model evaluation is a critical component in supervised machine learning classification analyses.
Items Response Theory (IRT) and Computer Adaptive Testing (CAT) with machine learning can benchmark datasets independent of the end-classification results.
- Score: 2.2060666847121864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model evaluation is a critical component in supervised machine learning
classification analyses. Traditional metrics do not currently incorporate case
difficulty. This renders the classification results unbenchmarked for
generalization. Item Response Theory (IRT) and Computer Adaptive Testing (CAT)
with machine learning can benchmark datasets independent of the
end-classification results. This provides high levels of case-level information
regarding evaluation utility. To showcase, two datasets were used: 1)
health-related and 2) physical science. For the health dataset a two-parameter
IRT model, and for the physical science dataset a polytonomous IRT model, was
used to analyze predictive features and place each case on a difficulty
continuum. A CAT approach was used to ascertain the algorithms' performance and
applicability to new data. This method provides an efficient way to benchmark
data, using only a fraction of the dataset (less than 1%) and 22-60x more
computationally efficient than traditional metrics. This novel metric, termed
Machine Learning Capability (MLC) has additional benefits as it is unbiased to
outcome classification and a standardized way to make model comparisons within
and across datasets. MLC provides a metric on the limitation of supervised
machine learning algorithms. In situations where the algorithm falls short,
other input(s) are required for decision-making.
Related papers
- Stabilizing Subject Transfer in EEG Classification with Divergence
Estimation [17.924276728038304]
We propose several graphical models to describe an EEG classification task.
We identify statistical relationships that should hold true in an idealized training scenario.
We design regularization penalties to enforce these relationships in two stages.
arXiv Detail & Related papers (2023-10-12T23:06:52Z) - Benchmarking Learning Efficiency in Deep Reservoir Computing [23.753943709362794]
We introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data.
We compare the learning speed of some established sequential supervised models, such as RNNs, LSTMs, or Transformers, with relatively less known alternative models based on reservoir computing.
arXiv Detail & Related papers (2022-09-29T08:16:52Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Evaluating Machine Unlearning via Epistemic Uncertainty [78.27542864367821]
This work presents an evaluation of Machine Unlearning algorithms based on uncertainty.
This is the first definition of a general evaluation of our best knowledge.
arXiv Detail & Related papers (2022-08-23T09:37:31Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Classifier Data Quality: A Geometric Complexity Based Method for
Automated Baseline And Insights Generation [4.722075132982135]
A major challenge is to determine when the level of incorrectness, e.g., model accuracy or F1 score for classifiers, is acceptable.
We have developed complexity measures, which quantify how difficult given observations are to assign to their true class label.
These measures are superior to the best practice baseline in that, for a linear computation cost, they also quantify each observation' classification complexity in an explainable form.
arXiv Detail & Related papers (2021-12-22T12:17:08Z) - Data vs classifiers, who wins? [0.0]
The classification experiments covered by machine learning (ML) are composed by two important parts: the data and the algorithm.
Data complexity is commonly not considered along with the model during a performance evaluation.
Recent studies employ Item Response Theory (IRT) as a new approach to evaluating datasets and algorithms.
arXiv Detail & Related papers (2021-07-15T16:55:15Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - Does imputation matter? Benchmark for predictive models [5.802346990263708]
This paper systematically evaluates the empirical effectiveness of data imputation algorithms for predictive models.
The main contributions are (1) the recommendation of a general method for empirical benchmarking based on real-life classification tasks.
arXiv Detail & Related papers (2020-07-06T15:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.