Multiple Descent in the Multiple Random Feature Model
- URL: http://arxiv.org/abs/2208.09897v3
- Date: Tue, 10 Oct 2023 08:15:32 GMT
- Title: Multiple Descent in the Multiple Random Feature Model
- Authors: Xuran Meng, Jianfeng Yao, Yuan Cao
- Abstract summary: We investigate the multiple descent phenomenon in a class of multi-component prediction models.
We show that risk curves with a specific number of descent generally exist in learning multi-component prediction models.
- Score: 8.988540634325691
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent works have demonstrated a double descent phenomenon in
over-parameterized learning. Although this phenomenon has been investigated by
recent works, it has not been fully understood in theory. In this paper, we
investigate the multiple descent phenomenon in a class of multi-component
prediction models. We first consider a ''double random feature model'' (DRFM)
concatenating two types of random features, and study the excess risk achieved
by the DRFM in ridge regression. We calculate the precise limit of the excess
risk under the high dimensional framework where the training sample size, the
dimension of data, and the dimension of random features tend to infinity
proportionally. Based on the calculation, we further theoretically demonstrate
that the risk curves of DRFMs can exhibit triple descent. We then provide a
thorough experimental study to verify our theory. At last, we extend our study
to the ''multiple random feature model'' (MRFM), and show that MRFMs ensembling
$K$ types of random features may exhibit $(K+1)$-fold descent. Our analysis
points out that risk curves with a specific number of descent generally exist
in learning multi-component prediction models.
Related papers
- Bayesian Double Descent [0.6906005491572398]
We show a natural Bayesian interpretation of the double descent effect.<n>We show that it is not in conflict with the traditional Occam's razor that Bayesian models possess.<n>We illustrate the approach with an example of Bayesian model selection in neural networks.
arXiv Detail & Related papers (2025-07-09T23:47:26Z) - High-dimensional ridge regression with random features for non-identically distributed data with a variance profile [0.0]
The behavior of the random feature model in the high-dimensional regression framework has become a popular issue of interest in the machine learning literature.
We study the performances of the random features model in the setting of non-iid feature vectors.
arXiv Detail & Related papers (2025-04-03T21:20:08Z) - von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model.
We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions.
Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Asymptotics of Bayesian Uncertainty Estimation in Random Features
Regression [1.170951597793276]
We focus on the variance of the posterior predictive distribution (Bayesian model average) and compare itss to that of the risk of the MAP estimator.
They also agree with each other when the number of samples grow faster than any constant multiple of model dimensions.
arXiv Detail & Related papers (2023-06-06T15:36:15Z) - Precise Asymptotic Analysis of Deep Random Feature Models [37.35013316704277]
We provide exact expressions for the performance of regression by an $L-$layer deep random feature (RF) model.
We characterize the variation of the eigendistribution in different layers of the equivalent Gaussian model.
arXiv Detail & Related papers (2023-02-13T09:30:25Z) - Simplex Random Features [53.97976744884616]
We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels.
We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels.
We show consistent gains provided by SimRFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers.
arXiv Detail & Related papers (2023-01-31T18:53:39Z) - Mitigating multiple descents: A model-agnostic framework for risk
monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation.
We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z) - On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective.
We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities.
We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z) - Model-based micro-data reinforcement learning: what are the crucial
model properties and which model to choose? [0.2836066255205732]
We contribute to micro-data model-based reinforcement learning (MBRL) by rigorously comparing popular generative models.
We find that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin.
We also found that deterministic models are on par, in fact they consistently (although non-significantly) outperform their probabilistic counterparts.
arXiv Detail & Related papers (2021-07-24T11:38:25Z) - Asymptotic Risk of Overparameterized Likelihood Models: Double Descent
Theory for Deep Neural Networks [12.132641563193582]
We investigate the risk of a general class of overvisibilityized likelihood models, including deep models.
We demonstrate that several explicit models, such as parallel deep neural networks and ensemble learning, are in agreement with our theory.
arXiv Detail & Related papers (2021-02-28T13:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.