Related papers: Tradeoff of generalization error in unsupervised learning

Tradeoff of generalization error in unsupervised learning

URL: http://arxiv.org/abs/2303.05718v2
Date: Tue, 12 Sep 2023 16:12:14 GMT
Title: Tradeoff of generalization error in unsupervised learning
Authors: Gilhan Kim, Hojun Lee, Junghyo Jo, Yongjoo Baek
Abstract summary: unsupervised learning generally exhibits a two-component tradeoff of the generalization error (GE) Using a more complex model reduces the model error at the cost of the data error, with the data error playing a more significant role for a smaller training dataset. Our results also indicate that the optimal model tends to be more complex when the data to be learned are more complex.
Score: 0.6554326244334868
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Finding the optimal model complexity that minimizes the generalization error (GE) is a key issue of machine learning. For the conventional supervised learning, this task typically involves the bias-variance tradeoff: lowering the bias by making the model more complex entails an increase in the variance. Meanwhile, little has been studied about whether the same tradeoff exists for unsupervised learning. In this study, we propose that unsupervised learning generally exhibits a two-component tradeoff of the GE, namely the model error and the data error -- using a more complex model reduces the model error at the cost of the data error, with the data error playing a more significant role for a smaller training dataset. This is corroborated by training the restricted Boltzmann machine to generate the configurations of the two-dimensional Ising model at a given temperature and the totally asymmetric simple exclusion process with given entry and exit rates. Our results also indicate that the optimal model tends to be more complex when the data to be learned are more complex.

Related papers

Operator Inference Aware Quadratic Manifolds with Isotropic Reduced Coordinates for Nonintrusive Model Reduction [0.9586962006710554]
We propose a greedy training procedure that takes into account both the reconstruction error on the snapshot data and the prediction error of reduced models fitted to the data.
arXiv Detail & Related papers (2025-07-28T01:45:55Z)
Overfitting has a limitation: a model-independent generalization error bound based on Rényi entropy [5.011668820933222]
We introduce a model-independent upper bound for generalization error applicable to algorithms whose outputs are determined solely by the data's histogram.<n>This framework offers a direct explanation for the phenomenon where generalization performance degrades significantly upon injecting random noise into data.<n>We adapt the no-free-lunch theorem to be data-distribution-dependent, demonstrating that an amount of data corresponding to the R'enyi entropy is indeed essential for successful learning.
arXiv Detail & Related papers (2025-05-30T19:41:37Z)
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops [55.07063067759609]
High-quality data is essential for training large generative models, yet the vast reservoir of real data available online has become nearly depleted. Models increasingly generate their own data for further training, forming Self-consuming Training Loops (STLs) Some models degrade or even collapse, while others successfully avoid these failures, leaving a significant gap in theoretical understanding.
arXiv Detail & Related papers (2025-02-26T06:18:13Z)
Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning [6.278498348219108]
We revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom. We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments.
arXiv Detail & Related papers (2024-10-02T06:09:57Z)
Aliasing and Label-Independent Decomposition of Risk: Beyond the bias-variance trade-off [0.0]
A central problem in data science is to use potentially noisy samples to predict function values for unseen inputs. We introduce an alternative paradigm called the generalized aliasing decomposition (GAD) GAD can be explicitly calculated from the relationship between model class and samples without seeing any data labels.
arXiv Detail & Related papers (2024-08-15T17:49:24Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
The Variability of Model Specification [2.4939887831898457]
It's regarded as an axiom that a good model is one that compromises between bias and variance. We investigate various regression model frameworks, including generalized linear models, Cox proportional hazard models, ARMA, and illustrate how misspecifying a model affects the variance.
arXiv Detail & Related papers (2021-10-06T03:59:19Z)
Information-Theoretic Generalization Bounds for Iterative Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles. Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z)
On the Minimal Error of Empirical Risk Minimization [90.09093901700754]
We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression. Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data.
arXiv Detail & Related papers (2021-02-24T04:47:55Z)
The Predictive Normalized Maximum Likelihood for Over-parameterized Linear Regression with Norm Constraint: Regret and Double Descent [12.929639356256928]
We show that modern machine learning models do not obey a trade-off between the complexity of a prediction rule and its ability to generalize. We use the recently proposed predictive normalized maximum likelihood (pNML) which is the min-max regret solution for individual data. We demonstrate the use of the pNML regret as a point-wise learnability measure on synthetic data and that it can successfully predict the double-decent phenomenon.
arXiv Detail & Related papers (2021-02-14T15:49:04Z)
Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning. Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
When Ensembling Smaller Models is More Efficient than Single Large Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute. This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.