Related papers: Proximal Iteration for Nonlinear Adaptive Lasso

Proximal Iteration for Nonlinear Adaptive Lasso

URL: http://arxiv.org/abs/2412.05726v1
Date: Sat, 07 Dec 2024 19:19:55 GMT
Title: Proximal Iteration for Nonlinear Adaptive Lasso
Authors: Nathan Wycoff, Lisa O. Singh, Ali Arab, Katharine M. Donato,
Abstract summary: We study the approach of treating the penalty coefficients as additional decision variables to be learned in a textitMaximum a Posteriori manner.<n>We develop a proximal gradient approach to joint optimization of these together with the parameters of any differentiable cost function.
Score: 1.866597543169743
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Augmenting a smooth cost function with an $\ell_1$ penalty allows analysts to efficiently conduct estimation and variable selection simultaneously in sophisticated models and can be efficiently implemented using proximal gradient methods. However, one drawback of the $\ell_1$ penalty is bias: nonzero parameters are underestimated in magnitude, motivating techniques such as the Adaptive Lasso which endow each parameter with its own penalty coefficient. But it's not clear how these parameter-specific penalties should be set in complex models. In this article, we study the approach of treating the penalty coefficients as additional decision variables to be learned in a \textit{Maximum a Posteriori} manner, developing a proximal gradient approach to joint optimization of these together with the parameters of any differentiable cost function. Beyond reducing bias in estimates, this procedure can also encourage arbitrary sparsity structure via a prior on the penalty coefficients. We compare our method to implementations of specific sparsity structures for non-Gaussian regression on synthetic and real datasets, finding our more general method to be competitive in terms of both speed and accuracy. We then consider nonlinear models for two case studies: COVID-19 vaccination behavior and international refugee movement, highlighting the applicability of this approach to complex problems and intricate sparsity structures.

Related papers

Lasso Penalization for High-Dimensional Beta Regression Models: Computation, Analysis, and Inference [3.330229314824914]
We develop a framework for non-asymptotic predictors with a negative log-likelihood function.<n>A gradient is devised for optimizing the resulting penalized negative log-likelihood function.<n>Our theoretical analysis is corroborated via simulation, and a real data example concerning the prediction of county-level incarceration is presented.
arXiv Detail & Related papers (2025-07-26T23:19:17Z)
Pathwise optimization for bridge-type estimators and its applications [49.1574468325115]
Pathwise methods allow to efficiently compute the full path for penalized estimators.<n>We apply these algorithms to the penalized estimation of processes observed at discrete times.
arXiv Detail & Related papers (2024-12-05T10:38:29Z)
Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate. We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
Parameter-Agnostic Optimization under Relaxed Smoothness [25.608968462899316]
We show that Normalized Gradient Descent with Momentum (NSGD-M) can achieve a rate-optimal complexity without prior knowledge of any problem parameter. In deterministic settings, the exponential factor can be neutralized by employing Gradient Descent with a Backtracking Line Search.
arXiv Detail & Related papers (2023-11-06T16:39:53Z)
Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood. These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z)
COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization [4.970364068620608]
We exploit convexity and L-smoothness to improve the noisy estimates outputted by the gradient oracle. We show that increasing the number and proximity of the queried points leads to better gradient estimates. We also apply COCO in vanilla settings by plugging it in existing algorithms, such as SGD, Adam or STRSAGA.
arXiv Detail & Related papers (2021-09-07T17:21:09Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Online Statistical Inference for Stochastic Optimization via Kiefer-Wolfowitz Methods [8.890430804063705]
We first present the distribution for the Polyak-Ruppert-averaging type Kiefer-Wolfowitz (AKW) estimators. The distributional result reflects the trade-off between statistical efficiency and function query complexity.
arXiv Detail & Related papers (2021-02-05T19:22:41Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
DiffPrune: Neural Network Pruning with Deterministic Approximate Binary Gates and $L_0$ Regularization [0.0]
Modern neural network architectures typically have many millions of parameters and can be pruned significantly without substantial loss in effectiveness. The contribution of this work is two-fold. The first is a method for approximating a multivariate Bernoulli random variable by means of a deterministic and differentiable transformation of any real-valued random variable. The second is a method for model selection by element-wise parameters with approximate binary gates that may be computed deterministically or multiplicationally and take on exact zero values.
arXiv Detail & Related papers (2020-12-07T13:08:56Z)
Divide and Learn: A Divide and Conquer Approach for Predict+Optimize [50.03608569227359]
The predict+optimize problem combines machine learning ofproblem coefficients with a optimization prob-lem that uses the predicted coefficients. We show how to directlyexpress the loss of the optimization problem in terms of thepredicted coefficients as a piece-wise linear function. We propose a novel divide and algorithm to tackle optimization problems without this restriction and predict itscoefficients using the optimization loss.
arXiv Detail & Related papers (2020-12-04T00:26:56Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.