Sketched Ridgeless Linear Regression: The Role of Downsampling
- URL: http://arxiv.org/abs/2302.01088v2
- Date: Fri, 13 Oct 2023 21:14:42 GMT
- Title: Sketched Ridgeless Linear Regression: The Role of Downsampling
- Authors: Xin Chen, Yicheng Zeng, Siyue Yang, Qiang Sun
- Abstract summary: We investigate two out-of-sample prediction risks of the sketched ridgeless least square estimator.
We identify the optimal sketching size that minimizes out-of-sample prediction risks.
We extend our analysis to cover central limit theorems and misspecified models.
- Score: 5.615701056715101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overparametrization often helps improve the generalization performance. This
paper presents a dual view of overparametrization suggesting that downsampling
may also help generalize. Focusing on the proportional regime $m\asymp n \asymp
p$, where $m$ represents the sketching size, $n$ is the sample size, and $p$ is
the feature dimensionality, we investigate two out-of-sample prediction risks
of the sketched ridgeless least square estimator. Our findings challenge
conventional beliefs by showing that downsampling does not always harm
generalization but can actually improve it in certain cases. We identify the
optimal sketching size that minimizes out-of-sample prediction risks and
demonstrate that the optimally sketched estimator exhibits stabler risk curves,
eliminating the peaks of those for the full-sample estimator. To facilitate
practical implementation, we propose an empirical procedure to determine the
optimal sketching size. Finally, we extend our analysis to cover central limit
theorems and misspecified models. Numerical studies strongly support our
theory.
Related papers
- Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning [5.293069542318491]
We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles.
For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally tuning by only sketch size in infinite ensembles.
We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles.
arXiv Detail & Related papers (2023-10-06T16:27:43Z) - Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation [4.87717454493713]
We study subsampling-based ridge ensembles in the proportionals regime.
We prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor.
arXiv Detail & Related papers (2023-04-25T17:43:27Z) - Distributed Sketching for Randomized Optimization: Exact
Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging.
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Oversampling Divide-and-conquer for Response-skewed Kernel Ridge
Regression [20.00435452480056]
We develop a novel response-adaptive partition strategy to overcome the limitation of the divide-and-conquer method.
We show the proposed estimate has a smaller mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions.
arXiv Detail & Related papers (2021-07-13T04:01:04Z) - Ridge Regression with Frequent Directions: Statistical and Optimization
Perspectives [1.0152838128195465]
We show that acrshortfd can be used in the optimization setting through an iterative scheme which yields high-accuracy solutions.
This improves on randomized approaches which need to compromise the need for a new sketch every iteration with speed of convergence.
arXiv Detail & Related papers (2020-11-06T21:40:38Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.