Related papers: Efficient Designs of SLOPE Penalty Sequences in Finite Dimension

Efficient Designs of SLOPE Penalty Sequences in Finite Dimension

URL: http://arxiv.org/abs/2102.07211v2
Date: Wed, 17 Feb 2021 02:59:27 GMT
Title: Efficient Designs of SLOPE Penalty Sequences in Finite Dimension
Authors: Yiliang Zhang, Zhiqi Bu
Abstract summary: In linear regression, SLOPE is a new convex analysis method that generalizes the Lasso via the sorted L1 penalty. In this paper, we propose two efficient algorithms to design the possibly high-dimensional SLOPE penalty.
Score: 5.787117733071415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In linear regression, SLOPE is a new convex analysis method that generalizes the Lasso via the sorted L1 penalty: larger fitted coefficients are penalized more heavily. This magnitude-dependent regularization requires an input of penalty sequence $\lambda$, instead of a scalar penalty as in the Lasso case, thus making the design extremely expensive in computation. In this paper, we propose two efficient algorithms to design the possibly high-dimensional SLOPE penalty, in order to minimize the mean squared error. For Gaussian data matrices, we propose a first order Projected Gradient Descent (PGD) under the Approximate Message Passing regime. For general data matrices, we present a zero-th order Coordinate Descent (CD) to design a sub-class of SLOPE, referred to as the k-level SLOPE. Our CD allows a useful trade-off between the accuracy and the computation speed. We demonstrate the performance of SLOPE with our designs via extensive experiments on synthetic data and real-world datasets.

Related papers

Sharp Trade-Offs in High-Dimensional Inference via 2-Level SLOPE [20.580487867158364]
We show that 2-level SLOPE offers a robust, scalable alternative to both LASSO and general SLOPE.<n>Our results suggest that 2-level SLOPE offers a robust, scalable alternative to both LASSO and general SLOPE.
arXiv Detail & Related papers (2025-07-12T01:57:10Z)
Scalable Approximate Algorithms for Optimal Transport Linear Models [0.769672852567215]
We propose a novel framework for solving a general class of non-negative linear regression models with an entropy-regularized OT datafit term. We derive simple multiplicative updates for common penalty and datafit terms. This method is suitable for large-scale problems due to its simplicity of implementation and straightforward parallelization.
arXiv Detail & Related papers (2025-04-06T20:37:25Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase. Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative. We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
PMaF: Deep Declarative Layers for Principal Matrix Features [37.662505982849844]
We explore two differentiable deep declarative layers, namely least squares on sphere (LESS) and implicit eigen decomposition (IED) LESS can be used to represent data features with a low-dimensional vector containing dominant information from a high-dimensional matrix. IED can be used to represent data features with a low-dimensional vector containing dominant information from a high-dimensional matrix.
arXiv Detail & Related papers (2023-06-26T15:13:36Z)
Coordinate Descent for SLOPE [6.838073951329198]
Sorted L-One Penalized Estimation (SLOPE) is a generalization of the lasso with appealing statistical properties. Current software packages that fit SLOPE rely on algorithms that perform poorly in high dimensions. We propose a new fast algorithm to solve the SLOPE optimization problem, which combines proximal gradient descent and proximal coordinate descent steps.
arXiv Detail & Related papers (2022-10-26T15:20:30Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Highly Parallel Autoregressive Entity Linking with Discriminative Correction [51.947280241185]
We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions. Our model is >70 times faster and more accurate than the previous generative method.
arXiv Detail & Related papers (2021-09-08T17:28:26Z)
Unbalanced Optimal Transport through Non-negative Penalized Linear Regression [9.668391961887027]
We show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. We propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. We derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties.
arXiv Detail & Related papers (2021-06-08T07:16:37Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
Modelling High-Dimensional Categorical Data Using Nonconvex Fusion Penalties [7.262048441360131]
Our estimator, called SCOPE, fuses levels together by making their corresponding coefficients exactly equal. We provide an algorithm for exact and efficient clustering within a non-dimensional block of variables.
arXiv Detail & Related papers (2020-02-28T09:20:41Z)
Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms. Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.