Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge
Ensembles
- URL: http://arxiv.org/abs/2307.03176v3
- Date: Tue, 9 Jan 2024 20:37:17 GMT
- Title: Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge
Ensembles
- Authors: Benjamin S. Ruben, Cengiz Pehlevan
- Abstract summary: We develop a theory of feature-bagging in noisy least-squares ridge ensembles.
We demonstrate that subsampling shifts the double-descent peak of a linear predictor.
We compare the performance of a feature-subsampling ensemble to a single linear predictor.
- Score: 34.32021888691789
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Feature bagging is a well-established ensembling method which aims to reduce
prediction variance by combining predictions of many estimators trained on
subsets or projections of features. Here, we develop a theory of
feature-bagging in noisy least-squares ridge ensembles and simplify the
resulting learning curves in the special case of equicorrelated data. Using
analytical learning curves, we demonstrate that subsampling shifts the
double-descent peak of a linear predictor. This leads us to introduce
heterogeneous feature ensembling, with estimators built on varying numbers of
feature dimensions, as a computationally efficient method to mitigate
double-descent. Then, we compare the performance of a feature-subsampling
ensemble to a single linear predictor, describing a trade-off between noise
amplification due to subsampling and noise reduction due to ensembling. Our
qualitative insights carry over to linear classifiers applied to image
classification tasks with realistic datasets constructed using a
state-of-the-art deep learning feature map.
Related papers
- Dimension-free Score Matching and Time Bootstrapping for Diffusion Models [11.743167854433306]
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels.
In this work, we establish the first (nearly) dimension-free sample bounds complexity for learning these score functions.
A key aspect of our analysis is the use of a single function approximator to jointly estimate scores across noise levels.
arXiv Detail & Related papers (2025-02-14T18:32:22Z) - Hodge-Aware Contrastive Learning [101.56637264703058]
Simplicial complexes prove effective in modeling data with multiway dependencies.
We develop a contrastive self-supervised learning approach for processing simplicial data.
arXiv Detail & Related papers (2023-09-14T00:40:07Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Gradient Descent with Linearly Correlated Noise: Theory and Applications
to Differential Privacy [17.81999485513265]
We study gradient descent under linearly correlated noise.
We use our results to develop new, effective matrix factorizations for differentially private optimization.
arXiv Detail & Related papers (2023-02-02T23:32:24Z) - Local Graph-homomorphic Processing for Privatized Distributed Systems [57.14673504239551]
We show that the added noise does not affect the performance of the learned model.
This is a significant improvement to previous works on differential privacy for distributed algorithms.
arXiv Detail & Related papers (2022-10-26T10:00:14Z) - Learning Low-Dimensional Nonlinear Structures from High-Dimensional
Noisy Data: An Integral Operator Approach [5.975670441166475]
We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations.
The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold.
The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction.
arXiv Detail & Related papers (2022-02-28T22:46:34Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating
Back-Propagation for Saliency Detection [54.98042023365694]
We propose a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples.
The proposed model consists of two sub-models parameterized by neural networks.
arXiv Detail & Related papers (2020-07-23T18:47:36Z) - Learning Randomly Perturbed Structured Predictors for Direct Loss
Minimization [18.981576950505442]
Direct loss minimization is a popular approach for learning predictors over structured label spaces.
We show that it balances better between the learned score function and the randomized noise in structured prediction.
arXiv Detail & Related papers (2020-07-11T08:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.