Related papers: Extensions to the Proximal Distance Method of Constrained Optimization

Extensions to the Proximal Distance Method of Constrained Optimization

URL: http://arxiv.org/abs/2009.00801v2
Date: Tue, 11 Jan 2022 23:10:18 GMT
Title: Extensions to the Proximal Distance Method of Constrained Optimization
Authors: Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange
Abstract summary: We study the problem of minimizing a loss $f(boldsymbolx)$ subject to constraints of the form $boldsymbolDboldsymbolx in S$. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns.
Score: 7.813460653362097
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The current paper studies the problem of minimizing a loss $f(\boldsymbol{x})$ subject to constraints of the form $\boldsymbol{D}\boldsymbol{x} \in S$, where $S$ is a closed set, convex or not, and $\boldsymbol{D}$ is a matrix that fuses parameters. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns. To tackle this generic class of problems, we combine the Beltrami-Courant penalty method with the proximal distance principle. The latter is driven by minimization of penalized objectives $f(\boldsymbol{x})+\frac{\rho}{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$ involving large tuning constants $\rho$ and the squared Euclidean distance of $\boldsymbol{D}\boldsymbol{x}$ from $S$. The next iterate $\boldsymbol{x}_{n+1}$ of the corresponding proximal distance algorithm is constructed from the current iterate $\boldsymbol{x}_n$ by minimizing the majorizing surrogate function $f(\boldsymbol{x})+\frac{\rho}{2}\|\boldsymbol{D}\boldsymbol{x}-\mathcal{P}_{S}(\boldsymbol{D}\boldsymbol{x}_n)\|^2$. For fixed $\rho$ and a subanalytic loss $f(\boldsymbol{x})$ and a subanalytic constraint set $S$, we prove convergence to a stationary point. Under stronger assumptions, we provide convergence rates and demonstrate linear local convergence. We also construct a steepest descent (SD) variant to avoid costly linear system solves. To benchmark our algorithms, we compare against the alternating direction method of multipliers (ADMM). Our extensive numerical tests include problems on metric projection, convex regression, convex clustering, total variation image denoising, and projection of a matrix to a good condition number. These experiments demonstrate the superior speed and acceptable accuracy of our steepest variant on high-dimensional problems.

Related papers

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs [54.28273395444243]
We show that the Monotonic Value Omega (MVP) algorithm achieves a variance-aware gap-dependent regret bound of $$tildeOleft(left(sum_Delta_h(s,a)>0 fracH2 log K land mathttVar_maxtextc$.
arXiv Detail & Related papers (2025-06-06T20:33:57Z)
Guessing Efficiently for Constrained Subspace Approximation [49.83981776254246]
We introduce a general framework for constrained subspace approximation.<n>We show it provides new algorithms for partition-constrained subspace approximation with applications to $k$-means clustering, and projected non-negative matrix factorization.
arXiv Detail & Related papers (2025-04-29T15:56:48Z)
Markov Kernels, Distances and Optimal Control: A Parable of Linear Quadratic Non-Gaussian Distribution Steering [3.497013356387396]
kernel for the Ito Markov diffusion $mathrmdboldsymbolx_t=boldsymbolA_tboldsymbolx_t=boldsymbolA_tboldsymbolx_t=boldsymbolA_tboldsymbolx_t=boldsymbolA_tboldsymbolx_t=boldsymbolA_tboldsymbolx_t=boldsymbolA
arXiv Detail & Related papers (2025-04-22T10:07:43Z)
Fast Debiasing of the LASSO Estimator [3.554868356768806]
In high-dimensional sparse regression, the textscLasso estimator offers excellent theoretical guarantees but is well-known to produce biased estimates. We introduce a method to debias" the textscLasso estimates for a random sub-Gaussian sensing matrix $boldsymbolA$.
arXiv Detail & Related papers (2025-02-27T06:59:17Z)
Log-concave Sampling from a Convex Body with a Barrier: a Robust and Unified Dikin Walk [12.842909157175582]
We consider the problem of sampling from a $d$-dimensional log-concave distribution $pi(theta) propto exp(-f(theta))$ for $L$-Lipschitz $f$. We propose a emphrobust sampling framework that computes spectral approximations to the Hessian of the barrier functions in each iteration.
arXiv Detail & Related papers (2024-10-08T05:32:51Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$<n>We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors [33.0212223058894]
The problem of recovering a signal from a quadratic system $y_i=boldsymbol xtopboldsymbol A_iboldsymbol x, i=1,ldots,m$ with full-rank matrices $boldsymbol A_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. This paper addresses the high-dimensional case where $mll n$ incorporating by prior knowledge of $boldsymbol x$.
arXiv Detail & Related papers (2023-09-16T16:00:07Z)
Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z)
Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis. We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$. Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z)
Minimax Optimal Regression over Sobolev Spaces via Laplacian Regularization on Neighborhood Graphs [25.597646488273558]
We study the statistical properties of Laplacian smoothing, a graph-based approach to nonparametric regression. We prove that Laplacian smoothing is manifold-adaptive.
arXiv Detail & Related papers (2021-06-03T01:20:41Z)
Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models [59.015960528781115]
We show how to optimally combine $hatboldsymbol xrm L$ and $hatboldsymbol xrm s$. In order to establish the limiting distribution of $(boldsymbol x, hatboldsymbol xrm L, hatboldsymbol xrm s)$, we design and analyze an Approximate Message Passing (AMP) algorithm.
arXiv Detail & Related papers (2020-08-07T18:20:05Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Tree-Projected Gradient Descent for Estimating Gradient-Sparse Parameters on Graphs [10.846572437131872]
We study estimation of a gradient-sparse parameter vector $boldsymboltheta* in mathbbRp$. We show that, under suitable restricted strong convexity and smoothness assumptions for the loss, the resulting estimator achieves the squared-error risk $fracs*n log (1+fracps*)$ up to a multiplicative constant that is independent of $G$.
arXiv Detail & Related papers (2020-05-31T20:08:13Z)
On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [86.92205445270427]
We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
arXiv Detail & Related papers (2019-06-02T03:03:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.