Extending Variability-Aware Model Selection with Bias Detection in
Machine Learning Projects
- URL: http://arxiv.org/abs/2311.14214v1
- Date: Thu, 23 Nov 2023 22:08:29 GMT
- Title: Extending Variability-Aware Model Selection with Bias Detection in
Machine Learning Projects
- Authors: Cristina Tavares, Nathalia Nascimento, Paulo Alencar, Donald Cowan
- Abstract summary: This paper describes work on extending an adaptive variability-aware model selection method with bias detection in machine learning projects.
The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions.
- Score: 0.7646713951724013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data science projects often involve various machine learning (ML) methods
that depend on data, code, and models. One of the key activities in these
projects is the selection of a model or algorithm that is appropriate for the
data analysis at hand. ML model selection depends on several factors, which
include data-related attributes such as sample size, functional requirements
such as the prediction algorithm type, and non-functional requirements such as
performance and bias. However, the factors that influence such selection are
often not well understood and explicitly represented. This paper describes
ongoing work on extending an adaptive variability-aware model selection method
with bias detection in ML projects. The method involves: (i) modeling the
variability of the factors that affect model selection using feature models
based on heuristics proposed in the literature; (ii) instantiating our
variability model with added features related to bias (e.g., bias-related
metrics); and (iii) conducting experiments that illustrate the method in a
specific case study to illustrate our approach based on a heart failure
prediction project. The proposed approach aims to advance the state of the art
by making explicit factors that influence model selection, particularly those
related to bias, as well as their interactions. The provided representations
can transform model selection in ML projects into a non ad hoc, adaptive, and
explainable process.
Related papers
- Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling.
Yet their widespread adoption poses challenges regarding data attribution and interpretability.
In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z) - Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - Subjectivity in Unsupervised Machine Learning Model Selection [2.9370710299422598]
This study uses the Hidden Markov Model as an example to investigate the subjectivity involved in model selection.
Sources of subjectivity include differing opinions on the importance of different criteria and metrics, differing views on how parsimonious a model should be, and how the size of a dataset should influence model selection.
arXiv Detail & Related papers (2023-09-01T01:40:58Z) - A prediction and behavioural analysis of machine learning methods for
modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice.
Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares.
It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z) - Selecting Treatment Effects Models for Domain Adaptation Using Causal
Knowledge [82.5462771088607]
We propose a novel model selection metric specifically designed for ITE methods under the unsupervised domain adaptation setting.
In particular, we propose selecting models whose predictions of interventions' effects satisfy known causal structures in the target domain.
arXiv Detail & Related papers (2021-02-11T21:03:14Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Model-specific Data Subsampling with Influence Functions [37.64859614131316]
We develop a model-specific data subsampling strategy that improves over random sampling whenever training points have varying influence.
Specifically, we leverage influence functions to guide our selection strategy, proving theoretically, and demonstrating empirically that our approach quickly selects high-quality models.
arXiv Detail & Related papers (2020-10-20T12:10:28Z) - Feature Selection Methods for Uplift Modeling and Heterogeneous
Treatment Effect [1.349645012479288]
Uplift modeling is a causal learning technique that estimates subgroup-level treatment effects.
Traditional methods for doing feature selection are not fit for the task.
We introduce a set of feature selection methods explicitly designed for uplift modeling.
arXiv Detail & Related papers (2020-05-05T00:28:18Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.