Related papers: Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces

Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces

URL: http://arxiv.org/abs/2210.02604v1
Date: Wed, 5 Oct 2022 23:31:54 GMT
Title: Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces
Authors: Amirali Aghazadeh and Nived Rajaraman and Tony Tu and Kannan Ramchandran
Abstract summary: We show that regularizing the spectral representation of machine learning models improves their generalization power when labeled data is scarce. Running gradient descent on the regularized loss results in a better generalization performance compared to baseline algorithms in several data-scarce real-world problems.
Score: 13.36217184117654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data-driven machine learning models are being increasingly employed in several important inference problems in biology, chemistry, and physics which require learning over combinatorial spaces. Recent empirical evidence (see, e.g., [1], [2], [3]) suggests that regularizing the spectral representation of such models improves their generalization power when labeled data is scarce. However, despite these empirical studies, the theoretical underpinning of when and how spectral regularization enables improved generalization is poorly understood. In this paper, we focus on learning pseudo-Boolean functions and demonstrate that regularizing the empirical mean squared error by the L_1 norm of the spectral transform of the learned function reshapes the loss landscape and allows for data-frugal learning, under a restricted secant condition on the learner's empirical error measured against the ground truth function. Under a weaker quadratic growth condition, we show that stationary points which also approximately interpolate the training data points achieve statistically optimal generalization performance. Complementing our theory, we empirically demonstrate that running gradient descent on the regularized loss results in a better generalization performance compared to baseline algorithms in several data-scarce real-world problems.

Related papers

Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity [23.509076905112526]
We present fine-grained error analysis for both attack-free and Byzantine-resilient decentralized learning with heterogeneous data.<n>We also reveal that attacks performed by malicious agents largely affect the error.
arXiv Detail & Related papers (2025-06-11T06:44:34Z)
Weak-to-Strong Generalization Through the Data-Centric Lens [12.221894353699918]
We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm.
arXiv Detail & Related papers (2024-12-05T05:29:19Z)
Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra [0.0]
Neural scaling laws describe how the performance of deep neural networks scales with key factors such as training data size, model complexity, and training time. We employ techniques from statistical mechanics to analyze one-pass gradient descent within a student-teacher framework.
arXiv Detail & Related papers (2024-10-11T17:21:42Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes. On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z)
Understanding Generalization of Federated Learning via Stability: Heterogeneity Matters [1.4502611532302039]
Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications. Generalization performance is a key metric in evaluating machine learning models when applied to real-world applications.
arXiv Detail & Related papers (2023-06-06T16:12:35Z)
Information-Theoretic Generalization Bounds for Iterative Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles. Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z)
Out-of-Distribution Generalization in Kernel Regression [21.958028127426196]
We study generalization in kernel regression when the training and test distributions are different. We identify an overlap matrix that quantifies the mismatch between distributions for a given kernel. We develop procedures for optimizing training and test distributions for a given data budget to find best and worst case generalizations under the shift.
arXiv Detail & Related papers (2021-06-04T04:54:25Z)
Functional Regularization for Representation Learning: A Unified Theoretical Perspective [27.93916012334704]
Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks. We present a unifying perspective where several such approaches can be viewed as imposing a regularization on the representation via a learnable function using unlabeled data. We propose a discriminative theoretical framework for analyzing the sample complexity of these approaches, which generalizes the framework of (Balcan and Blum, 2010) to allow learnable regularization functions.
arXiv Detail & Related papers (2020-08-06T04:06:04Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.