SplitWise Regression: Stepwise Modeling with Adaptive Dummy Encoding
- URL: http://arxiv.org/abs/2505.15423v1
- Date: Wed, 21 May 2025 12:06:43 GMT
- Title: SplitWise Regression: Stepwise Modeling with Adaptive Dummy Encoding
- Authors: Marcell T. Kurbucz, Nikolaos Tzivanakis, Nilufer Sari Aslam, Adam M. Sykulski,
- Abstract summary: SplitWise is a novel framework that enhances stepwise regression.<n>It adaptively transforms numeric predictors into threshold-based binary features using shallow decision trees.<n>It is evaluated on both synthetic and real-world datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Capturing nonlinear relationships without sacrificing interpretability remains a persistent challenge in regression modeling. We introduce SplitWise, a novel framework that enhances stepwise regression. It adaptively transforms numeric predictors into threshold-based binary features using shallow decision trees, but only when such transformations improve model fit, as assessed by the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). This approach preserves the transparency of linear models while flexibly capturing nonlinear effects. Implemented as a user-friendly R package, SplitWise is evaluated on both synthetic and real-world datasets. The results show that it consistently produces more parsimonious and generalizable models than traditional stepwise and penalized regression techniques.
Related papers
- A Cycle-Consistency Constrained Framework for Dynamic Solution Space Reduction in Noninjective Regression [2.775364659317507]
This paper proposes a cycle consistency-based data-driven training framework.<n>Experiments on normalized synthetic and simulated datasets demonstrate that the proposed method achieves a cycle reconstruction error below 0.003.<n>The framework significantly reduces reliance on manual intervention, demonstrating potential advantages in non-injective regression tasks.
arXiv Detail & Related papers (2025-07-07T04:28:01Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Augmented Regression Models using Neurochaos Learning [1.534667887016089]
We present novel Augmented Regression Models using Neurochaos Learning (NL), where Tracemean features derived from the Neurochaos Learning framework are integrated with traditional regression algorithms.<n>Our approach was evaluated using ten diverse real-life datasets and a synthetically generated dataset of the form $y = mx + c + epsilon$.
arXiv Detail & Related papers (2025-05-19T11:02:14Z) - Branch and Bound to Assess Stability of Regression Coefficients in Uncertain Models [0.6990493129893112]
We introduce our algorithm, along with supporting mathematical results, an example application, and a link to our computer code.
It helps researchers summarize high-dimensional data and assess the stability of regression coefficients in uncertain models.
arXiv Detail & Related papers (2024-08-19T01:37:14Z) - Adaptive Optimization for Prediction with Missing Data [6.800113478497425]
We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously.<n>In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
arXiv Detail & Related papers (2024-02-02T16:35:51Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available.
Most existing methods are only applicable when the sample size is small.
We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - A Locally Adaptive Interpretable Regression [7.4267694612331905]
Linear regression is one of the most interpretable prediction models.
In this work, we introduce a locally adaptive interpretable regression (LoAIR)
Our model achieves comparable or better predictive performance than the other state-of-the-art baselines.
arXiv Detail & Related papers (2020-05-07T09:26:14Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.