Related papers: Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation

Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation

URL: http://arxiv.org/abs/2403.19720v1
Date: Wed, 27 Mar 2024 21:18:43 GMT
Title: Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation
Authors: Yanhao Jin, Krishnakumar Balasubramanian, Debashis Paul,
Abstract summary: We consider meta-learning within the framework of high-dimensional random-effects linear models. We show the precise behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task. We propose and analyze an estimator inverse random regression coefficients based on data from the training tasks.
Score: 14.194212772887699
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Meta-learning involves training models on a variety of training tasks in a way that enables them to generalize well on new, unseen test tasks. In this work, we consider meta-learning within the framework of high-dimensional multivariate random-effects linear models and study generalized ridge-regression based predictions. The statistical intuition of using generalized ridge regression in this setting is that the covariance structure of the random regression coefficients could be leveraged to make better predictions on new tasks. Accordingly, we first characterize the precise asymptotic behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task. We next show that this predictive risk is optimal when the weight matrix in generalized ridge regression is chosen to be the inverse of the covariance matrix of random coefficients. Finally, we propose and analyze an estimator of the inverse covariance matrix of random regression coefficients based on data from the training tasks. As opposed to intractable MLE-type estimators, the proposed estimators could be computed efficiently as they could be obtained by solving (global) geodesically-convex optimization problems. Our analysis and methodology use tools from random matrix theory and Riemannian optimization. Simulation results demonstrate the improved generalization performance of the proposed method on new unseen test tasks within the considered framework.

Related papers

Revisiting Randomization in Greedy Model Search [16.15551706774035]
We propose and analyze an ensemble of greedy forward selection estimators that are randomized by feature subsampling.<n>We design a novel implementation based on dynamic programming that greatly improves its computational efficiency.<n>Contrary to prevailing belief that randomized ensembling is analogous to shrinkage, we show that it can simultaneously reduce training error and degrees of freedom.
arXiv Detail & Related papers (2025-06-18T17:13:53Z)
A Simplified Analysis of SGD for Linear Regression with Weight Averaging [64.2393952273612]
Recent work bycitetzou 2021benign provides sharp rates for SGD optimization in linear regression using constant learning rate.<n>We provide a simplified analysis recovering the same bias and variance bounds provided incitepzou 2021benign based on simple linear algebra tools.<n>We believe our work makes the analysis of gradient descent on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling.
arXiv Detail & Related papers (2025-06-18T15:10:38Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
A Provably Accurate Randomized Sampling Algorithm for Logistic Regression [2.7930955543692817]
We present a simple, randomized sampling-based algorithm for logistic regression problem. We prove that accurate approximations can be achieved with a sample whose size is much smaller than the total number of observations. Overall, our work sheds light on the potential of using randomized sampling approaches to efficiently approximate the estimated probabilities in logistic regression.
arXiv Detail & Related papers (2024-02-26T06:20:28Z)
A Novel Approach in Solving Stochastic Generalized Linear Regression via Nonconvex Programming [1.6874375111244329]
This paper considers a generalized linear regression model as a problem with chance constraints. The results of the proposed algorithm were over 1 to 2 percent better than the ordinary logistic regression model.
arXiv Detail & Related papers (2024-01-16T16:45:51Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Robust Regularized Low-Rank Matrix Models for Regression and Classification [14.698622796774634]
We propose a framework for matrix variate regression models based on a rank constraint, vector regularization (e.g., sparsity), and a general loss function. We show that the algorithm is guaranteed to converge, all accumulation points of the algorithm have estimation errors in the order of $O(sqrtn)$ally and substantially attaining the minimax rate.
arXiv Detail & Related papers (2022-05-14T18:03:48Z)
Regression-based projection for learning Mori-Zwanzig operators [0.0]
We propose to adopt statistical regression as the projection operator to enable data-driven learning of the operators in the Mori-Zwanzig formalism. We show that the choice of linear regression results in a recently proposed data-driven learning algorithm based on Mori's projection operator.
arXiv Detail & Related papers (2022-05-10T19:35:47Z)
An Extended Multi-Model Regression Approach for Compressive Strength Prediction and Optimization of a Concrete Mixture [0.0]
A model based evaluation of concrete compressive strength is of high value, both for the purpose of strength prediction and the mixture optimization. We take a further step towards improving the accuracy of the prediction model via the weighted combination of multiple regression methods. A proposed (GA)-based mixture optimization is proposed, building on the obtained multi-regression model.
arXiv Detail & Related papers (2021-06-13T16:10:32Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
Optimal Feature Manipulation Attacks Against Linear Regression [64.54500628124511]
In this paper, we investigate how to manipulate the coefficients obtained via linear regression by adding carefully designed poisoning data points to the dataset or modify the original data points. Given the energy budget, we first provide the closed-form solution of the optimal poisoning data point when our target is modifying one designated regression coefficient. We then extend the analysis to the more challenging scenario where the attacker aims to change one particular regression coefficient while making others to be changed as small as possible.
arXiv Detail & Related papers (2020-02-29T04:26:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.