Rethinking and Recomputing the Value of ML Models
- URL: http://arxiv.org/abs/2209.15157v1
- Date: Fri, 30 Sep 2022 01:02:31 GMT
- Title: Rethinking and Recomputing the Value of ML Models
- Authors: Burcu Sayin, Fabio Casati, Andrea Passerini, Jie Yang, Xinyue Chen
- Abstract summary: We argue that the way we have been training and evaluating ML models has largely forgotten the fact that they are applied in an organization or societal context.
We show that with this perspective we fundamentally change how we evaluate, select and deploy ML models.
- Score: 28.80821411530123
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we argue that the way we have been training and evaluating ML
models has largely forgotten the fact that they are applied in an organization
or societal context as they provide value to people. We show that with this
perspective we fundamentally change how we evaluate, select and deploy ML
models - and to some extent even what it means to learn. Specifically, we
stress that the notion of value plays a central role in learning and
evaluating, and different models may require different learning practices and
provide different values based on the application context they are applied. We
also show that this concretely impacts how we select and embed models into
human workflows based on experimental datasets. Nothing of what is presented
here is hard: to a large extent is a series of fairly trivial observations with
massive practical implications.
Related papers
- VHELM: A Holistic Evaluation of Vision Language Models [75.88987277686914]
We present the Holistic Evaluation of Vision Language Models (VHELM)
VHELM aggregates various datasets to cover one or more of the 9 aspects: visual perception, knowledge, reasoning, bias, fairness, multilinguality, robustness, toxicity, and safety.
Our framework is designed to be lightweight and automatic so that evaluation runs are cheap and fast.
arXiv Detail & Related papers (2024-10-09T17:46:34Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - OLMES: A Standard for Language Model Evaluations [64.85905119836818]
We propose OLMES, a practical, open standard for reproducible language model evaluations.
We identify and review the varying factors in evaluation practices adopted by the community.
OLMES supports meaningful comparisons between smaller base models that require the unnatural "cloze" formulation of multiple-choice questions.
arXiv Detail & Related papers (2024-06-12T17:37:09Z) - A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence [2.498836880652668]
We present a novel framework for thinking about the deployment of machine learning models in a performative, human-ML collaborative system.
In our framework, the introduction of ML recommendations changes the data-generating process of human decisions.
We find that for many levels of ML performance, humans can improve upon the ML predictions.
arXiv Detail & Related papers (2024-05-22T15:38:30Z) - What is it for a Machine Learning Model to Have a Capability? [0.0]
We develop an account of machine learning models' capabilities which can be usefully applied to the nascent science of model evaluation.
Our core proposal is a conditional analysis of model abilities (CAMA), crudely, a machine learning model has a capability to X just when it would reliably succeed at doing X if it 'tried'
arXiv Detail & Related papers (2024-05-14T23:03:52Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Evaluation Gaps in Machine Learning Practice [13.963766987258161]
In practice, evaluations of machine learning models frequently focus on a narrow range of decontextualized predictive behaviours.
We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations.
By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts.
arXiv Detail & Related papers (2022-05-11T04:00:44Z) - On the Value of ML Models [7.301530330533432]
We argue that, when establishing and benchmarking Machine Learning (ML) models, the research community should favour evaluation metrics that better capture the value delivered by their model in practical applications.
For a specific class of use cases -- selective classification -- we show that not only can it be simple enough to do, but that it has import consequences and provides insights what to look for in a good'' ML model.
arXiv Detail & Related papers (2021-12-13T16:32:08Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Insights into Performance Fitness and Error Metrics for Machine Learning [1.827510863075184]
Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis.
This paper examines a number of the most commonly-used performance fitness and error metrics for regression and classification algorithms.
arXiv Detail & Related papers (2020-05-17T22:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.