Pathological Regularization Regimes in Classification Tasks
- URL: http://arxiv.org/abs/2406.14731v1
- Date: Thu, 20 Jun 2024 20:54:06 GMT
- Title: Pathological Regularization Regimes in Classification Tasks
- Authors: Maximilian Wiesmann, Paul Larsen,
- Abstract summary: We show the possibility of a trend reversal in binary classification tasks between the dataset and a classification score obtained from a trained model.
This trend reversal occurs for certain choices of the regularization parameter for model training, namely, if the parameter is contained in what we call the pathological regularization regime.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper we demonstrate the possibility of a trend reversal in binary classification tasks between the dataset and a classification score obtained from a trained model. This trend reversal occurs for certain choices of the regularization parameter for model training, namely, if the parameter is contained in what we call the pathological regularization regime. For ridge regression, we give necessary and sufficient algebraic conditions on the dataset for the existence of a pathological regularization regime. Moreover, our results provide a data science practitioner with a hands-on tool to avoid hyperparameter choices suffering from trend reversal. We furthermore present numerical results on pathological regularization regimes for logistic regression. Finally, we draw connections to datasets exhibiting Simpson's paradox, providing a natural source of pathological datasets.
Related papers
- Prevalidated ridge regression is a highly-efficient drop-in replacement
for logistic regression for high-dimensional data [7.532661545437305]
We present a prevalidated ridge regression model that matches logistic regression in terms of classification error and log-loss.
We scale the coefficients of the model so as to minimise log-loss for a set of prevalidated predictions.
This exploits quantities already computed in the course of fitting the ridge regression model in order to find the scaling parameter with nominal additional computational expense.
arXiv Detail & Related papers (2024-01-28T09:38:14Z) - Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps.
We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights.
In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z) - Continuous-Time Modeling of Counterfactual Outcomes Using Neural
Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare.
Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions.
We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z) - Provable Guarantees for Sparsity Recovery with Deterministic Missing
Data Patterns [30.553697242038233]
We consider the case in which the observed dataset is censored by a deterministic, non-uniform filter.
We propose an efficient algorithm for missing value imputation by utilizing the topological property of the censorship filter.
arXiv Detail & Related papers (2022-06-10T06:14:45Z) - Modeling High-Dimensional Data with Unknown Cut Points: A Fusion
Penalized Logistic Threshold Regression [2.520538806201793]
In traditional logistic regression models, the link function is often assumed to be linear and continuous in predictors.
We consider a threshold model that all continuous features are discretized into ordinal levels, which further determine the binary responses.
We find the lasso model is well suited in the problem of early detection and prediction for chronic disease like diabetes.
arXiv Detail & Related papers (2022-02-17T04:16:40Z) - Continuously Generalized Ordinal Regression for Linear and Deep Models [41.03778663275373]
Ordinal regression is a classification task where classes have an order and prediction error increases the further the predicted class is from the true class.
We propose a new approach for modeling ordinal data that allows class-specific hyperplane slopes.
Our method significantly outperforms the standard ordinal logistic model over a thorough set of ordinal regression benchmark datasets.
arXiv Detail & Related papers (2022-02-14T19:49:05Z) - An Optimal Control Approach to Learning in SIDARTHE Epidemic model [67.22168759751541]
We propose a general approach for learning time-variant parameters of dynamic compartmental models from epidemic data.
We forecast the epidemic evolution in Italy and France.
arXiv Detail & Related papers (2020-10-28T10:58:59Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Generalisation error in learning with random features and the hidden
manifold model [23.71637173968353]
We study generalised linear regression and classification for a synthetically generated dataset.
We consider the high-dimensional regime and using the replica method from statistical physics.
We show how to obtain the so-called double descent behaviour for logistic regression with a peak at the threshold.
We discuss the role played by correlations in the data generated by the hidden manifold model.
arXiv Detail & Related papers (2020-02-21T14:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.