Rethinking and Recomputing the Value of ML Models
- URL: http://arxiv.org/abs/2209.15157v1
- Date: Fri, 30 Sep 2022 01:02:31 GMT
- Title: Rethinking and Recomputing the Value of ML Models
- Authors: Burcu Sayin, Fabio Casati, Andrea Passerini, Jie Yang, Xinyue Chen
- Abstract summary: We argue that the way we have been training and evaluating ML models has largely forgotten the fact that they are applied in an organization or societal context.
We show that with this perspective we fundamentally change how we evaluate, select and deploy ML models.
- Score: 28.80821411530123
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we argue that the way we have been training and evaluating ML
models has largely forgotten the fact that they are applied in an organization
or societal context as they provide value to people. We show that with this
perspective we fundamentally change how we evaluate, select and deploy ML
models - and to some extent even what it means to learn. Specifically, we
stress that the notion of value plays a central role in learning and
evaluating, and different models may require different learning practices and
provide different values based on the application context they are applied. We
also show that this concretely impacts how we select and embed models into
human workflows based on experimental datasets. Nothing of what is presented
here is hard: to a large extent is a series of fairly trivial observations with
massive practical implications.
Related papers
- OLMES: A Standard for Language Model Evaluations [64.85905119836818]
We propose OLMES, a practical, open standard for reproducible language model evaluations.
We identify and review the varying factors in evaluation practices adopted by the community.
OLMES supports meaningful comparisons between smaller base models that require the unnatural "cloze" formulation of multiple-choice questions.
arXiv Detail & Related papers (2024-06-12T17:37:09Z) - When is an Embedding Model More Promising than Another? [33.540506562970776]
Embedders play a central role in machine learning, projecting any object into numerical representations that can be leveraged to perform various downstream tasks.
The evaluation of embedding models typically depends on domain-specific empirical approaches.
We present a unified approach to evaluate embedders, drawing upon the concepts of sufficiency and informativeness.
arXiv Detail & Related papers (2024-06-11T18:13:46Z) - What is it for a Machine Learning Model to Have a Capability? [0.0]
We develop an account of machine learning models' capabilities which can be usefully applied to the nascent science of model evaluation.
Our core proposal is a conditional analysis of model abilities (CAMA), crudely, a machine learning model has a capability to X just when it would reliably succeed at doing X if it 'tried'
arXiv Detail & Related papers (2024-05-14T23:03:52Z) - Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.
We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Evaluation Gaps in Machine Learning Practice [13.963766987258161]
In practice, evaluations of machine learning models frequently focus on a narrow range of decontextualized predictive behaviours.
We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations.
By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts.
arXiv Detail & Related papers (2022-05-11T04:00:44Z) - On the Value of ML Models [7.301530330533432]
We argue that, when establishing and benchmarking Machine Learning (ML) models, the research community should favour evaluation metrics that better capture the value delivered by their model in practical applications.
For a specific class of use cases -- selective classification -- we show that not only can it be simple enough to do, but that it has import consequences and provides insights what to look for in a good'' ML model.
arXiv Detail & Related papers (2021-12-13T16:32:08Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Insights into Performance Fitness and Error Metrics for Machine Learning [1.827510863075184]
Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis.
This paper examines a number of the most commonly-used performance fitness and error metrics for regression and classification algorithms.
arXiv Detail & Related papers (2020-05-17T22:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.