Related papers: The Role of Hyperparameters in Predictive Multiplicity

The Role of Hyperparameters in Predictive Multiplicity

URL: http://arxiv.org/abs/2503.13506v1
Date: Thu, 13 Mar 2025 19:22:44 GMT
Title: The Role of Hyperparameters in Predictive Multiplicity
Authors: Mustafa Cavus, Katarzyna Woźnica, Przemysław Biecek,
Abstract summary: Different machine learning models trained on the same dataset yield divergent predictions for identical inputs.<n>These inconsistencies can seriously impact high-stakes decisions such as credit assessments, hiring, and medical diagnoses.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper investigates the critical role of hyperparameters in predictive multiplicity, where different machine learning models trained on the same dataset yield divergent predictions for identical inputs. These inconsistencies can seriously impact high-stakes decisions such as credit assessments, hiring, and medical diagnoses. Focusing on six widely used models for tabular data - Elastic Net, Decision Tree, k-Nearest Neighbor, Support Vector Machine, Random Forests, and Extreme Gradient Boosting - we explore how hyperparameter tuning influences predictive multiplicity, as expressed by the distribution of prediction discrepancies across benchmark datasets. Key hyperparameters such as lambda in Elastic Net, gamma in Support Vector Machines, and alpha in Extreme Gradient Boosting play a crucial role in shaping predictive multiplicity, often compromising the stability of predictions within specific algorithms. Our experiments on 21 benchmark datasets reveal that tuning these hyperparameters leads to notable performance improvements but also increases prediction discrepancies, with Extreme Gradient Boosting exhibiting the highest discrepancy and substantial prediction instability. This highlights the trade-off between performance optimization and prediction consistency, raising concerns about the risk of arbitrary predictions. These findings provide insight into how hyperparameter optimization leads to predictive multiplicity. While predictive multiplicity allows prioritizing domain-specific objectives such as fairness and reduces reliance on a single model, it also complicates decision-making, potentially leading to arbitrary or unjustified outcomes.

Related papers

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z)
Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm [14.980926991441345]
We show that datasets containing interventional data can be effectively extracted under realistic assumptions about the data distribution.<n>We introduce a novel variant of interventional faithfulness, which relies on comparisons between the marginal distributions of each variable across observational and interventional settings.<n>We also introduce Intersort, an algorithm designed to infer the causal order from datasets containing large numbers of single-variable interventions.
arXiv Detail & Related papers (2024-05-28T16:07:17Z)
Hyperparameter Importance Analysis for Multi-Objective AutoML [14.336028105614824]
In this paper, we propose the first method for assessing the importance of hyper parameters in multi-objective optimization tasks.<n>Specifically, we compute the a-priori scalarization of the objectives and determine the importance of the hyper parameters for different objective tradeoffs.
arXiv Detail & Related papers (2024-05-13T11:00:25Z)
Overparameterized Multiple Linear Regression as Hyper-Curve Fitting [0.0]
It is proven that a linear model will produce exact predictions even in the presence of nonlinear dependencies that violate the model assumptions. The hyper-curve approach is especially suited for the regularization of problems with noise in predictor variables and can be used to remove noisy and "improper" predictors from the model.
arXiv Detail & Related papers (2024-04-11T15:43:11Z)
Subject-specific Deep Neural Networks for Count Data with High-cardinality Categorical Features [1.2289361708127877]
We propose a novel hierarchical likelihood learning framework for introducing gamma random effects into a Poisson deep neural network. The proposed method simultaneously yields maximum likelihood estimators for fixed parameters and best unbiased predictors for random effects. State-of-the-art network architectures can be easily implemented into the proposed h-likelihood framework.
arXiv Detail & Related papers (2023-10-18T01:54:48Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Causality-oriented robustness: exploiting general additive interventions [3.871660145364189]
In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG) In a linear setting, we prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts. We extend our approach to the semi-supervised domain adaptation setting to further improve prediction performance.
arXiv Detail & Related papers (2023-07-18T16:22:50Z)
Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters. EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z)
Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting [61.02295959343446]
This work first proposes a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from interaction modules. We build a general CU-aware regression framework with an original permutation-equivariant uncertainty estimator to do both tasks of regression and uncertainty estimation. We apply the proposed framework to current SOTA multi-agent trajectory forecasting systems as a plugin module.
arXiv Detail & Related papers (2022-07-11T21:17:41Z)
Which Invariance Should We Transfer? A Causal Minimax Learning Approach [18.71316951734806]
We present a comprehensive minimax analysis from a causal perspective. We propose an efficient algorithm to search for the subset with minimal worst-case risk. The effectiveness and efficiency of our methods are demonstrated on synthetic data and the diagnosis of Alzheimer's disease.
arXiv Detail & Related papers (2021-07-05T09:07:29Z)
Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z)
Probabilistic electric load forecasting through Bayesian Mixture Density Networks [70.50488907591463]
Probabilistic load forecasting (PLF) is a key component in the extended tool-chain required for efficient management of smart energy grids. We propose a novel PLF approach, framed on Bayesian Mixture Density Networks. To achieve reliable and computationally scalable estimators of the posterior distributions, both Mean Field variational inference and deep ensembles are integrated.
arXiv Detail & Related papers (2020-12-23T16:21:34Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.