Characterizing the SLOPE Trade-off: A Variational Perspective and the
Donoho-Tanner Limit
- URL: http://arxiv.org/abs/2105.13302v1
- Date: Thu, 27 May 2021 16:56:42 GMT
- Title: Characterizing the SLOPE Trade-off: A Variational Perspective and the
Donoho-Tanner Limit
- Authors: Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie J. Su
- Abstract summary: Sorted l1 regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems.
We show how this technique improves variable selection by characterizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion (TPP) or, equivalently, between measures of type I error and power.
- Score: 29.344264789740894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sorted l1 regularization has been incorporated into many methods for solving
high-dimensional statistical estimation problems, including the SLOPE estimator
in linear regression. In this paper, we study how this relatively new
regularization technique improves variable selection by characterizing the
optimal SLOPE trade-off between the false discovery proportion (FDP) and true
positive proportion (TPP) or, equivalently, between measures of type I error
and power. Assuming a regime of linear sparsity and working under Gaussian
random designs, we obtain an upper bound on the optimal trade-off for SLOPE,
showing its capability of breaking the Donoho-Tanner power limit. To put it
into perspective, this limit is the highest possible power that the Lasso,
which is perhaps the most popular l1-based method, can achieve even with
arbitrarily strong effect sizes. Next, we derive a tight lower bound that
delineates the fundamental limit of sorted l1 regularization in optimally
trading the FDP off for the TPP. Finally, we show that on any problem instance,
SLOPE with a certain regularization sequence outperforms the Lasso, in the
sense of having a smaller FDP, larger TPP and smaller l2 estimation risk
simultaneously. Our proofs are based on a novel technique that reduces a
variational calculus problem to a class of infinite-dimensional convex
optimization problems and a very recent result from approximate message passing
theory.
Related papers
- Horizon-Free Regret for Linear Markov Decision Processes [92.02082223856479]
A recent line of works showed regret bounds in reinforcement learning can be (nearly) independent of planning horizon.
We give the first horizon-free bound for the popular linear Markov Decision Process (MDP) setting.
In contrast to prior works which explicitly estimate the transition model and compute the inhomogeneous value functions, we directly estimate the value functions and confidence sets.
arXiv Detail & Related papers (2024-03-15T23:50:58Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression [12.443289202402761]
We show the benefits of batch- partitioning through the lens of a minimum-norm overparametrized linear regression model.
We characterize the optimal batch size and show it is inversely proportional to the noise level.
We also show that shrinking the batch minimum-norm estimator by a factor equal to the Weiner coefficient further stabilizes it and results in lower quadratic risk in all settings.
arXiv Detail & Related papers (2023-06-14T11:02:08Z) - Faster Projection-Free Augmented Lagrangian Methods via Weak Proximal
Oracle [16.290192687098383]
This paper considers a convex composite optimization problem with affine constraints.
Motivated by high-dimensional applications in which exact projection/proximal computations are not tractable, we propose a textitprojection-free augmented Lagrangian-based method.
arXiv Detail & Related papers (2022-10-25T12:51:43Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - High-dimensional limit theorems for SGD: Effective dynamics and critical
scaling [6.950316788263433]
We prove limit theorems for the trajectories of summary statistics of gradient descent (SGD)
We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss.
About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate.
arXiv Detail & Related papers (2022-06-08T17:42:18Z) - A Dimensionality Reduction Method for Finding Least Favorable Priors
with a Focus on Bregman Divergence [108.28566246421742]
This paper develops a dimensionality reduction method that allows us to move the optimization to a finite-dimensional setting with an explicit bound on the dimension.
In order to make progress on the problem, we restrict ourselves to Bayesian risks induced by a relatively large class of loss functions, namely Bregman divergences.
arXiv Detail & Related papers (2022-02-23T16:22:28Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - A Stochastic Composite Augmented Lagrangian Method For Reinforcement
Learning [9.204659134755795]
We consider the linear programming (LP) formulation for deep reinforcement learning.
The augmented Lagrangian method suffers the double-sampling obstacle in solving the LP.
A deep parameterized augment Lagrangian method is proposed.
arXiv Detail & Related papers (2021-05-20T13:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.