Related papers: Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects

Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects

URL: http://arxiv.org/abs/2311.14214v1
Date: Thu, 23 Nov 2023 22:08:29 GMT
Title: Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects
Authors: Cristina Tavares, Nathalia Nascimento, Paulo Alencar, Donald Cowan
Abstract summary: This paper describes work on extending an adaptive variability-aware model selection method with bias detection in machine learning projects. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions.
Score: 0.7646713951724013
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data science projects often involve various machine learning (ML) methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and non-functional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.

Related papers

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
Variability-Aware Machine Learning Model Selection: Feature Modeling, Instantiation, and Experimental Case Study [0.6999740786886538]
We present a variability-aware ML algorithm selection approach that considers the commonalities and variations in the model selection process. The proposed approach can be seen as a step towards providing a more explicit, adaptive, transparent, interpretable, and automated basis for model selection.
arXiv Detail & Related papers (2024-12-31T16:29:37Z)
Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z)
Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL) In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent. We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z)
Subjectivity in Unsupervised Machine Learning Model Selection [2.9370710299422598]
This study uses the Hidden Markov Model as an example to investigate the subjectivity involved in model selection. Sources of subjectivity include differing opinions on the importance of different criteria and metrics, differing views on how parsimonious a model should be, and how the size of a dataset should influence model selection.
arXiv Detail & Related papers (2023-09-01T01:40:58Z)
A prediction and behavioural analysis of machine learning methods for modelling travel mode choice [0.26249027950824505]
We conduct a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice. Results indicate that the models with the highest disaggregate predictive performance provide poorer estimates of behavioural indicators and aggregate mode shares. It is also observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.
arXiv Detail & Related papers (2023-01-11T11:10:32Z)
Selecting Treatment Effects Models for Domain Adaptation Using Causal Knowledge [82.5462771088607]
We propose a novel model selection metric specifically designed for ITE methods under the unsupervised domain adaptation setting. In particular, we propose selecting models whose predictions of interventions' effects satisfy known causal structures in the target domain.
arXiv Detail & Related papers (2021-02-11T21:03:14Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
Model-specific Data Subsampling with Influence Functions [37.64859614131316]
We develop a model-specific data subsampling strategy that improves over random sampling whenever training points have varying influence. Specifically, we leverage influence functions to guide our selection strategy, proving theoretically, and demonstrating empirically that our approach quickly selects high-quality models.
arXiv Detail & Related papers (2020-10-20T12:10:28Z)
Feature Selection Methods for Uplift Modeling and Heterogeneous Treatment Effect [1.349645012479288]
Uplift modeling is a causal learning technique that estimates subgroup-level treatment effects. Traditional methods for doing feature selection are not fit for the task. We introduce a set of feature selection methods explicitly designed for uplift modeling.
arXiv Detail & Related papers (2020-05-05T00:28:18Z)
Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions. Motivated by these theoretical results, we propose learning several approximate proposals for the best model. In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.