Subjectivity in Unsupervised Machine Learning Model Selection
- URL: http://arxiv.org/abs/2309.00201v2
- Date: Fri, 5 Jan 2024 05:54:58 GMT
- Title: Subjectivity in Unsupervised Machine Learning Model Selection
- Authors: Wanyi Chen, Mary L. Cummings
- Abstract summary: This study uses the Hidden Markov Model as an example to investigate the subjectivity involved in model selection.
Sources of subjectivity include differing opinions on the importance of different criteria and metrics, differing views on how parsimonious a model should be, and how the size of a dataset should influence model selection.
- Score: 2.9370710299422598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model selection is a necessary step in unsupervised machine learning. Despite
numerous criteria and metrics, model selection remains subjective. A high
degree of subjectivity may lead to questions about repeatability and
reproducibility of various machine learning studies and doubts about the
robustness of models deployed in the real world. Yet, the impact of modelers'
preferences on model selection outcomes remains largely unexplored. This study
uses the Hidden Markov Model as an example to investigate the subjectivity
involved in model selection. We asked 33 participants and three Large Language
Models (LLMs) to make model selections in three scenarios. Results revealed
variability and inconsistencies in both the participants' and the LLMs'
choices, especially when different criteria and metrics disagree. Sources of
subjectivity include varying opinions on the importance of different criteria
and metrics, differing views on how parsimonious a model should be, and how the
size of a dataset should influence model selection. The results underscore the
importance of developing a more standardized way to document subjective choices
made in model selection processes.
Related papers
- What matters when building vision-language models? [52.8539131958858]
We develop Idefics2, an efficient foundational vision-language model with 8 billion parameters.
Idefics2 achieves state-of-the-art performance within its size category across various multimodal benchmarks.
We release the model (base, instructed, and chat) along with the datasets created for its training.
arXiv Detail & Related papers (2024-05-03T17:00:00Z) - Extending Variability-Aware Model Selection with Bias Detection in
Machine Learning Projects [0.7646713951724013]
This paper describes work on extending an adaptive variability-aware model selection method with bias detection in machine learning projects.
The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions.
arXiv Detail & Related papers (2023-11-23T22:08:29Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Ensembling improves stability and power of feature selection for deep
learning models [11.973624420202388]
In this paper, we show that inherentity in the design and training of deep learning models makes commonly used feature importance scores unstable.
We explore the ensembling of feature importance scores of models across different epochs and find that this simple approach can substantially address this issue.
We present a framework to combine the feature importance of trained models and instead of selecting features from one best model, we perform an ensemble of feature importance scores from numerous good models.
arXiv Detail & Related papers (2022-10-02T19:07:53Z) - Deep Learning for Choice Modeling [5.173001988341294]
We develop deep learning-based choice models under two settings of choice modeling: feature-free and feature-based.
Our model captures both the intrinsic utility for each candidate choice and the effect that the assortment has on the choice probability.
arXiv Detail & Related papers (2022-08-19T13:10:17Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Model Selection's Disparate Impact in Real-World Deep Learning
Applications [3.924854655504237]
Algorithmic fairness has emphasized the role of biased data in automated decision outcomes.
We contend that one source of such bias, human preferences in model selection, remains under-explored in terms of its role in disparate impact across demographic groups.
arXiv Detail & Related papers (2021-04-01T16:37:01Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Model-specific Data Subsampling with Influence Functions [37.64859614131316]
We develop a model-specific data subsampling strategy that improves over random sampling whenever training points have varying influence.
Specifically, we leverage influence functions to guide our selection strategy, proving theoretically, and demonstrating empirically that our approach quickly selects high-quality models.
arXiv Detail & Related papers (2020-10-20T12:10:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.