On the Value of ML Models
- URL: http://arxiv.org/abs/2112.06775v1
- Date: Mon, 13 Dec 2021 16:32:08 GMT
- Title: On the Value of ML Models
- Authors: Fabio Casati, Pierre-Andr\'e No\"el and Jie Yang
- Abstract summary: We argue that, when establishing and benchmarking Machine Learning (ML) models, the research community should favour evaluation metrics that better capture the value delivered by their model in practical applications.
For a specific class of use cases -- selective classification -- we show that not only can it be simple enough to do, but that it has import consequences and provides insights what to look for in a good'' ML model.
- Score: 7.301530330533432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We argue that, when establishing and benchmarking Machine Learning (ML)
models, the research community should favour evaluation metrics that better
capture the value delivered by their model in practical applications. For a
specific class of use cases -- selective classification -- we show that not
only can it be simple enough to do, but that it has import consequences and
provides insights what to look for in a ``good'' ML model.
Related papers
- Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - A General Framework for Data-Use Auditing of ML Models [47.369572284751285]
We propose a general method to audit an ML model for the use of a data-owner's data in training.
We show the effectiveness of our proposed framework by applying it to audit data use in two types of ML models.
arXiv Detail & Related papers (2024-07-21T09:32:34Z) - OLMES: A Standard for Language Model Evaluations [64.85905119836818]
We propose OLMES, a practical, open standard for reproducible language model evaluations.
We identify and review the varying factors in evaluation practices adopted by the community.
OLMES supports meaningful comparisons between smaller base models that require the unnatural "cloze" formulation of multiple-choice questions.
arXiv Detail & Related papers (2024-06-12T17:37:09Z) - Large Language Models Must Be Taught to Know What They Don't Know [97.90008709512921]
We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead.
We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
arXiv Detail & Related papers (2024-06-12T16:41:31Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Evaluating Representations with Readout Model Switching [18.475866691786695]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric.
We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions.
The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Rethinking and Recomputing the Value of ML Models [28.80821411530123]
We argue that the way we have been training and evaluating ML models has largely forgotten the fact that they are applied in an organization or societal context.
We show that with this perspective we fundamentally change how we evaluate, select and deploy ML models.
arXiv Detail & Related papers (2022-09-30T01:02:31Z) - Evaluation of HTR models without Ground Truth Material [2.4792948967354236]
evaluation of Handwritten Text Recognition models during their development is straightforward.
But the evaluation process becomes tricky as soon as we switch from development to application.
We show that lexicon-based evaluation can compete with lexicon-based methods.
arXiv Detail & Related papers (2022-01-17T01:26:09Z) - Insights into Performance Fitness and Error Metrics for Machine Learning [1.827510863075184]
Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis.
This paper examines a number of the most commonly-used performance fitness and error metrics for regression and classification algorithms.
arXiv Detail & Related papers (2020-05-17T22:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.