Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge
Ensembles
- URL: http://arxiv.org/abs/2307.03176v3
- Date: Tue, 9 Jan 2024 20:37:17 GMT
- Title: Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge
Ensembles
- Authors: Benjamin S. Ruben, Cengiz Pehlevan
- Abstract summary: We develop a theory of feature-bagging in noisy least-squares ridge ensembles.
We demonstrate that subsampling shifts the double-descent peak of a linear predictor.
We compare the performance of a feature-subsampling ensemble to a single linear predictor.
- Score: 34.32021888691789
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Feature bagging is a well-established ensembling method which aims to reduce
prediction variance by combining predictions of many estimators trained on
subsets or projections of features. Here, we develop a theory of
feature-bagging in noisy least-squares ridge ensembles and simplify the
resulting learning curves in the special case of equicorrelated data. Using
analytical learning curves, we demonstrate that subsampling shifts the
double-descent peak of a linear predictor. This leads us to introduce
heterogeneous feature ensembling, with estimators built on varying numbers of
feature dimensions, as a computationally efficient method to mitigate
double-descent. Then, we compare the performance of a feature-subsampling
ensemble to a single linear predictor, describing a trade-off between noise
amplification due to subsampling and noise reduction due to ensembling. Our
qualitative insights carry over to linear classifiers applied to image
classification tasks with realistic datasets constructed using a
state-of-the-art deep learning feature map.
Related papers
- Hodge-Aware Contrastive Learning [101.56637264703058]
Simplicial complexes prove effective in modeling data with multiway dependencies.
We develop a contrastive self-supervised learning approach for processing simplicial data.
arXiv Detail & Related papers (2023-09-14T00:40:07Z) - Informative regularization for a multi-layer perceptron RR Lyrae
classifier under data shift [3.303002683812084]
We propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem.
Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
arXiv Detail & Related papers (2023-03-12T02:49:19Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Gradient Descent with Linearly Correlated Noise: Theory and Applications
to Differential Privacy [17.81999485513265]
We study gradient descent under linearly correlated noise.
We use our results to develop new, effective matrix factorizations for differentially private optimization.
arXiv Detail & Related papers (2023-02-02T23:32:24Z) - Local Graph-homomorphic Processing for Privatized Distributed Systems [57.14673504239551]
We show that the added noise does not affect the performance of the learned model.
This is a significant improvement to previous works on differential privacy for distributed algorithms.
arXiv Detail & Related papers (2022-10-26T10:00:14Z) - Learning Low-Dimensional Nonlinear Structures from High-Dimensional
Noisy Data: An Integral Operator Approach [5.975670441166475]
We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations.
The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold.
The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction.
arXiv Detail & Related papers (2022-02-28T22:46:34Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating
Back-Propagation for Saliency Detection [54.98042023365694]
We propose a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples.
The proposed model consists of two sub-models parameterized by neural networks.
arXiv Detail & Related papers (2020-07-23T18:47:36Z) - Learning Randomly Perturbed Structured Predictors for Direct Loss
Minimization [18.981576950505442]
Direct loss minimization is a popular approach for learning predictors over structured label spaces.
We show that it balances better between the learned score function and the randomized noise in structured prediction.
arXiv Detail & Related papers (2020-07-11T08:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.