Related papers: A Systematic Bias of Machine Learning Regression Models and Its Correction: an Application to Imaging-based Brain Age Prediction

A Systematic Bias of Machine Learning Regression Models and Its Correction: an Application to Imaging-based Brain Age Prediction

URL: http://arxiv.org/abs/2405.15950v2
Date: Wed, 4 Sep 2024 15:08:49 GMT
Title: A Systematic Bias of Machine Learning Regression Models and Its Correction: an Application to Imaging-based Brain Age Prediction
Authors: Hwiyoung Lee, Shuo Chen,
Abstract summary: Machine learning models for continuous outcomes often yield systematically biased predictions. Predictions for large-valued outcomes tend to be negatively biased (underestimating actual values) Those for small-valued outcomes are positively biased (overestimating actual values)
Score: 2.4894581801802227
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning models for continuous outcomes often yield systematically biased predictions, particularly for values that largely deviate from the mean. Specifically, predictions for large-valued outcomes tend to be negatively biased (underestimating actual values), while those for small-valued outcomes are positively biased (overestimating actual values). We refer to this linear central tendency warped bias as the "systematic bias of machine learning regression". In this paper, we first demonstrate that this systematic prediction bias persists across various machine learning regression models, and then delve into its theoretical underpinnings. To address this issue, we propose a general constrained optimization approach designed to correct this bias and develop computationally efficient implementation algorithms. Simulation results indicate that our correction method effectively eliminates the bias from the predicted outcomes. We apply the proposed approach to the prediction of brain age using neuroimaging data. In comparison to competing machine learning regression models, our method effectively addresses the longstanding issue of "systematic bias of machine learning regression" in neuroimaging-based brain age calculation, yielding unbiased predictions of brain age.

Related papers

Adversarial Debiasing for Unbiased Parameter Recovery [0.8749675983608172]
We show how prediction errors from machine learning models can lead to bias in the estimates of regression coefficients. We propose a test for detecting bias, and demonstrate the use of an adversarial machine learning algorithm in order to de-bias predictions. We conduct simulations and empirical exercises using ground truth and satellite data on forest cover in Africa.
arXiv Detail & Related papers (2025-02-17T20:54:56Z)
Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased [0.0]
Imbalanced binary classification problems arise in many fields of study. It is common to subsample the majority class to create a (more) balanced dataset for model training. This biases the model's predictions because the model learns from a dataset that does not follow the same data generating process as new data.
arXiv Detail & Related papers (2024-12-17T19:38:29Z)
Automatic debiasing of neural networks via moment-constrained learning [0.0]
Naively learning the regression function and taking a sample mean of the target functional results in biased estimators. We propose moment-constrained learning as a new RR learning approach that addresses some shortcomings in automatic debiasing.
arXiv Detail & Related papers (2024-09-29T20:56:54Z)
Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios. Existing debiasing methods suffer from high costs in bias labeling or model re-training. We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z)
Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients. Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z)
Using Pareto Simulated Annealing to Address Algorithmic Bias in Machine Learning [2.055949720959582]
We present a multi-objective optimisation strategy that optimises for both balanced accuracy and underestimation. We demonstrate the effectiveness of this strategy on one synthetic and two real-world datasets.
arXiv Detail & Related papers (2021-05-31T15:51:43Z)
Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. We formulate the regression-free model updates into a constrained optimization problem. We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)
Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions. We make robust and efficient counterfactual predictions for both individual and average treatment effects. The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z)
Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network. Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation. We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
A Locally Adaptive Interpretable Regression [7.4267694612331905]
Linear regression is one of the most interpretable prediction models. In this work, we introduce a locally adaptive interpretable regression (LoAIR) Our model achieves comparable or better predictive performance than the other state-of-the-art baselines.
arXiv Detail & Related papers (2020-05-07T09:26:14Z)
Random Machines Regression Approach: an ensemble support vector regression model with free kernel choice [0.0]
In this article we propose a procedure to use the bagged-weighted support vector model to regression problems. The results exhibited a good performance of Regression Random Machines through lower generalization error without needing to choose the best kernel function during tuning process.
arXiv Detail & Related papers (2020-03-27T21:30:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.