Spectral Regularization Allows Data-frugal Learning over Combinatorial
Spaces
- URL: http://arxiv.org/abs/2210.02604v1
- Date: Wed, 5 Oct 2022 23:31:54 GMT
- Title: Spectral Regularization Allows Data-frugal Learning over Combinatorial
Spaces
- Authors: Amirali Aghazadeh and Nived Rajaraman and Tony Tu and Kannan
Ramchandran
- Abstract summary: We show that regularizing the spectral representation of machine learning models improves their generalization power when labeled data is scarce.
Running gradient descent on the regularized loss results in a better generalization performance compared to baseline algorithms in several data-scarce real-world problems.
- Score: 13.36217184117654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-driven machine learning models are being increasingly employed in
several important inference problems in biology, chemistry, and physics which
require learning over combinatorial spaces. Recent empirical evidence (see,
e.g., [1], [2], [3]) suggests that regularizing the spectral representation of
such models improves their generalization power when labeled data is scarce.
However, despite these empirical studies, the theoretical underpinning of when
and how spectral regularization enables improved generalization is poorly
understood. In this paper, we focus on learning pseudo-Boolean functions and
demonstrate that regularizing the empirical mean squared error by the L_1 norm
of the spectral transform of the learned function reshapes the loss landscape
and allows for data-frugal learning, under a restricted secant condition on the
learner's empirical error measured against the ground truth function. Under a
weaker quadratic growth condition, we show that stationary points which also
approximately interpolate the training data points achieve statistically
optimal generalization performance. Complementing our theory, we empirically
demonstrate that running gradient descent on the regularized loss results in a
better generalization performance compared to baseline algorithms in several
data-scarce real-world problems.
Related papers
- Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra [0.0]
Neural scaling laws describe how the performance of deep neural networks scales with key factors such as training data size, model complexity, and training time.
We employ techniques from statistical mechanics to analyze one-pass gradient descent within a student-teacher framework.
arXiv Detail & Related papers (2024-10-11T17:21:42Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - Understanding Generalization of Federated Learning via Stability:
Heterogeneity Matters [1.4502611532302039]
Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications.
Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications.
arXiv Detail & Related papers (2023-06-06T16:12:35Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - Out-of-Distribution Generalization in Kernel Regression [21.958028127426196]
We study generalization in kernel regression when the training and test distributions are different.
We identify an overlap matrix that quantifies the mismatch between distributions for a given kernel.
We develop procedures for optimizing training and test distributions for a given data budget to find best and worst case generalizations under the shift.
arXiv Detail & Related papers (2021-06-04T04:54:25Z) - Functional Regularization for Representation Learning: A Unified
Theoretical Perspective [27.93916012334704]
Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks.
We present a unifying perspective where several such approaches can be viewed as imposing a regularization on the representation via a learnable function using unlabeled data.
We propose a discriminative theoretical framework for analyzing the sample complexity of these approaches, which generalizes the framework of (Balcan and Blum, 2010) to allow learnable regularization functions.
arXiv Detail & Related papers (2020-08-06T04:06:04Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.