Related papers: Optimally Weighted Ensembles of Regression Models: Exact Weight Optimization and Applications

Optimally Weighted Ensembles of Regression Models: Exact Weight Optimization and Applications

URL: http://arxiv.org/abs/2206.11263v1
Date: Wed, 22 Jun 2022 09:11:14 GMT
Title: Optimally Weighted Ensembles of Regression Models: Exact Weight Optimization and Applications
Authors: Patrick Echtenbruck, Martina Echtenbruck, Joost Batenburg, Thomas B\"ack, Boris Naujoks, Michael Emmerich
Abstract summary: We show that combining different regression models can yield better results than selecting a single ('best') regression model. We outline an efficient method that obtains optimally weighted linear combination from a heterogeneous set of regression models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated model selection is often proposed to users to choose which machine learning model (or method) to apply to a given regression task. In this paper, we show that combining different regression models can yield better results than selecting a single ('best') regression model, and outline an efficient method that obtains optimally weighted convex linear combination from a heterogeneous set of regression models. More specifically, in this paper, a heuristic weight optimization, used in a preceding conference paper, is replaced by an exact optimization algorithm using convex quadratic programming. We prove convexity of the quadratic programming formulation for the straightforward formulation and for a formulation with weighted data points. The novel weight optimization is not only (more) exact but also more efficient. The methods we develop in this paper are implemented and made available via github-open source. They can be executed on commonly available hardware and offer a transparent and easy to interpret interface. The results indicate that the approach outperforms model selection methods on a range of data sets, including data sets with mixed variable type from drug discovery applications.

Related papers

pared: Model selection using multi-objective optimization [0.351124620232225]
We present the R package pared to enable the use of multi-objective optimization for model selection.<n>Our approach entails the use of Gaussian process-based optimization to efficiently identify solutions that represent desirable trade-offs.
arXiv Detail & Related papers (2025-05-27T20:20:04Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Data-Informed Model Complexity Metric for Optimizing Symbolic Regression Models [0.0]
We introduce a pragmatic method to estimate model complexity using Hessian rank for post-processing selection. This method aligns model selection with input data complexity, calculated using intrinsic dimensionality (ID) estimators.
arXiv Detail & Related papers (2025-01-29T01:53:22Z)
An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models [86.05414211113627]
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models.
arXiv Detail & Related papers (2025-01-28T01:57:51Z)
Efficient Optimization Algorithms for Linear Adversarial Training [9.933836677441684]
Adversarial training can be used to learn models that are robust against perturbations. We propose tailored optimization algorithms for the adversarial training of linear models.
arXiv Detail & Related papers (2024-10-16T15:41:08Z)
Adaptive Optimization for Prediction with Missing Data [6.800113478497425]
We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously. In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
arXiv Detail & Related papers (2024-02-02T16:35:51Z)
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization. We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z)
A Consistent and Scalable Algorithm for Best Subset Selection in Single Index Models [1.3236116985407258]
Best subset selection in high-dimensional models is known to be computationally intractable. We propose the first provably scalable algorithm for best subset selection in high-dimensional SIMs. Our algorithm enjoys the subset selection consistency and has the oracle property with a high probability.
arXiv Detail & Related papers (2023-09-12T13:48:06Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
A model-free feature selection technique of feature screening and random forest based recursive feature elimination [0.0]
We propose a model-free feature selection method for ultra-high dimensional data with mass features. We show that the proposed method is selection consistent and $L$ consistent under weak regularity conditions.
arXiv Detail & Related papers (2023-02-15T03:39:16Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression. It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise. This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z)
Personalizing Performance Regression Models to Black-Box Optimization Problems [0.755972004983746]
In this work, we propose a personalized regression approach for numerical optimization problems. We also investigate the impact of selecting not a single regression model per problem, but personalized ensembles. We test our approach on predicting the performance of numerical optimizations on the BBOB benchmark collection.
arXiv Detail & Related papers (2021-04-22T11:47:47Z)
MINA: Convex Mixed-Integer Programming for Non-Rigid Shape Alignment [77.38594866794429]
convex mixed-integer programming formulation for non-rigid shape matching. We propose a novel shape deformation model based on an efficient low-dimensional discrete model.
arXiv Detail & Related papers (2020-02-28T09:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.