Related papers: Sketched Ridgeless Linear Regression: The Role of Downsampling

Sketched Ridgeless Linear Regression: The Role of Downsampling

URL: http://arxiv.org/abs/2302.01088v2
Date: Fri, 13 Oct 2023 21:14:42 GMT
Title: Sketched Ridgeless Linear Regression: The Role of Downsampling
Authors: Xin Chen, Yicheng Zeng, Siyue Yang, Qiang Sun
Abstract summary: We investigate two out-of-sample prediction risks of the sketched ridgeless least square estimator. We identify the optimal sketching size that minimizes out-of-sample prediction risks. We extend our analysis to cover central limit theorems and misspecified models.
Score: 5.615701056715101
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Overparametrization often helps improve the generalization performance. This paper presents a dual view of overparametrization suggesting that downsampling may also help generalize. Focusing on the proportional regime $m\asymp n \asymp p$, where $m$ represents the sketching size, $n$ is the sample size, and $p$ is the feature dimensionality, we investigate two out-of-sample prediction risks of the sketched ridgeless least square estimator. Our findings challenge conventional beliefs by showing that downsampling does not always harm generalization but can actually improve it in certain cases. We identify the optimal sketching size that minimizes out-of-sample prediction risks and demonstrate that the optimally sketched estimator exhibits stabler risk curves, eliminating the peaks of those for the full-sample estimator. To facilitate practical implementation, we propose an empirical procedure to determine the optimal sketching size. Finally, we extend our analysis to cover central limit theorems and misspecified models. Numerical studies strongly support our theory.

Related papers

Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning [5.293069542318491]
We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles. For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally tuning by only sketch size in infinite ensembles. We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles.
arXiv Detail & Related papers (2023-10-06T16:27:43Z)
Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation [4.87717454493713]
We study subsampling-based ridge ensembles in the proportionals regime. We prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor.
arXiv Detail & Related papers (2023-04-25T17:43:27Z)
Distributed Sketching for Randomized Optimization: Exact Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging. We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning. The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression [20.00435452480056]
We develop a novel response-adaptive partition strategy to overcome the limitation of the divide-and-conquer method. We show the proposed estimate has a smaller mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions.
arXiv Detail & Related papers (2021-07-13T04:01:04Z)
Ridge Regression with Frequent Directions: Statistical and Optimization Perspectives [1.0152838128195465]
We show that acrshortfd can be used in the optimization setting through an iterative scheme which yields high-accuracy solutions. This improves on randomized approaches which need to compromise the need for a new sketch every iteration with speed of convergence.
arXiv Detail & Related papers (2020-11-06T21:40:38Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model. This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine. We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.