Related papers: Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles

Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles

URL: http://arxiv.org/abs/2307.03176v3
Date: Tue, 9 Jan 2024 20:37:17 GMT
Title: Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles
Authors: Benjamin S. Ruben, Cengiz Pehlevan
Abstract summary: We develop a theory of feature-bagging in noisy least-squares ridge ensembles. We demonstrate that subsampling shifts the double-descent peak of a linear predictor. We compare the performance of a feature-subsampling ensemble to a single linear predictor.
Score: 34.32021888691789
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting learning curves in the special case of equicorrelated data. Using analytical learning curves, we demonstrate that subsampling shifts the double-descent peak of a linear predictor. This leads us to introduce heterogeneous feature ensembling, with estimators built on varying numbers of feature dimensions, as a computationally efficient method to mitigate double-descent. Then, we compare the performance of a feature-subsampling ensemble to a single linear predictor, describing a trade-off between noise amplification due to subsampling and noise reduction due to ensembling. Our qualitative insights carry over to linear classifiers applied to image classification tasks with realistic datasets constructed using a state-of-the-art deep learning feature map.

Related papers

Dimension-free Score Matching and Time Bootstrapping for Diffusion Models [11.743167854433306]
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels. In this work, we establish the first (nearly) dimension-free sample bounds complexity for learning these score functions. A key aspect of our analysis is the use of a single function approximator to jointly estimate scores across noise levels.
arXiv Detail & Related papers (2025-02-14T18:32:22Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Hodge-Aware Contrastive Learning [101.56637264703058]
Simplicial complexes prove effective in modeling data with multiway dependencies. We develop a contrastive self-supervised learning approach for processing simplicial data.
arXiv Detail & Related papers (2023-09-14T00:40:07Z)
Informative regularization for a multi-layer perceptron RR Lyrae classifier under data shift [3.303002683812084]
We propose a scalable and easily adaptable approach based on an informative regularization and an ad-hoc training procedure to mitigate the shift problem. Our method provides a new path to incorporate knowledge from characteristic features into artificial neural networks to manage the underlying data shift problem.
arXiv Detail & Related papers (2023-03-12T02:49:19Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy [17.81999485513265]
We study gradient descent under linearly correlated noise. We use our results to develop new, effective matrix factorizations for differentially private optimization.
arXiv Detail & Related papers (2023-02-02T23:32:24Z)
Local Graph-homomorphic Processing for Privatized Distributed Systems [57.14673504239551]
We show that the added noise does not affect the performance of the learned model. This is a significant improvement to previous works on differential privacy for distributed algorithms.
arXiv Detail & Related papers (2022-10-26T10:00:14Z)
Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach [5.975670441166475]
We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction.
arXiv Detail & Related papers (2022-02-28T22:46:34Z)
On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods. We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z)
Learning Debiased and Disentangled Representations for Semantic Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation. By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes. Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z)
Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating Back-Propagation for Saliency Detection [54.98042023365694]
We propose a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples. The proposed model consists of two sub-models parameterized by neural networks.
arXiv Detail & Related papers (2020-07-23T18:47:36Z)
Learning Randomly Perturbed Structured Predictors for Direct Loss Minimization [18.981576950505442]
Direct loss minimization is a popular approach for learning predictors over structured label spaces. We show that it balances better between the learned score function and the randomized noise in structured prediction.
arXiv Detail & Related papers (2020-07-11T08:59:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.