Extensions to the Proximal Distance Method of Constrained Optimization
- URL: http://arxiv.org/abs/2009.00801v2
- Date: Tue, 11 Jan 2022 23:10:18 GMT
- Title: Extensions to the Proximal Distance Method of Constrained Optimization
- Authors: Alfonso Landeros, Oscar Hernan Madrid Padilla, Hua Zhou, Kenneth Lange
- Abstract summary: We study the problem of minimizing a loss $f(boldsymbolx)$ subject to constraints of the form $boldsymbolDboldsymbolx in S$.
Fusion constraints can capture smoothness, sparsity, or more general constraint patterns.
- Score: 7.813460653362097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The current paper studies the problem of minimizing a loss
$f(\boldsymbol{x})$ subject to constraints of the form
$\boldsymbol{D}\boldsymbol{x} \in S$, where $S$ is a closed set, convex or not,
and $\boldsymbol{D}$ is a matrix that fuses parameters. Fusion constraints can
capture smoothness, sparsity, or more general constraint patterns. To tackle
this generic class of problems, we combine the Beltrami-Courant penalty method
with the proximal distance principle. The latter is driven by minimization of
penalized objectives
$f(\boldsymbol{x})+\frac{\rho}{2}\text{dist}(\boldsymbol{D}\boldsymbol{x},S)^2$
involving large tuning constants $\rho$ and the squared Euclidean distance of
$\boldsymbol{D}\boldsymbol{x}$ from $S$. The next iterate
$\boldsymbol{x}_{n+1}$ of the corresponding proximal distance algorithm is
constructed from the current iterate $\boldsymbol{x}_n$ by minimizing the
majorizing surrogate function
$f(\boldsymbol{x})+\frac{\rho}{2}\|\boldsymbol{D}\boldsymbol{x}-\mathcal{P}_{S}(\boldsymbol{D}\boldsymbol{x}_n)\|^2$.
For fixed $\rho$ and a subanalytic loss $f(\boldsymbol{x})$ and a subanalytic
constraint set $S$, we prove convergence to a stationary point. Under stronger
assumptions, we provide convergence rates and demonstrate linear local
convergence. We also construct a steepest descent (SD) variant to avoid costly
linear system solves. To benchmark our algorithms, we compare against the
alternating direction method of multipliers (ADMM). Our extensive numerical
tests include problems on metric projection, convex regression, convex
clustering, total variation image denoising, and projection of a matrix to a
good condition number. These experiments demonstrate the superior speed and
acceptable accuracy of our steepest variant on high-dimensional problems.
Related papers
- Log-concave Sampling over a Convex Body with a Barrier: a Robust and Unified Dikin Walk [12.842909157175582]
We consider the problem of sampling from a $d$-dimensional log-concave distribution $pi(theta) propto exp(-f(theta))$ for $L$-Lipschitz $f$.
We propose a emphrobust sampling framework that computes spectral approximations to the Hessian of the barrier functions in each iteration.
arXiv Detail & Related papers (2024-10-08T05:32:51Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors [33.0212223058894]
The problem of recovering a signal from a quadratic system $y_i=boldsymbol xtopboldsymbol A_iboldsymbol x, i=1,ldots,m$ with full-rank matrices $boldsymbol A_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging.
This paper addresses the high-dimensional case where $mll n$ incorporating by prior knowledge of $boldsymbol x$.
arXiv Detail & Related papers (2023-09-16T16:00:07Z) - Convergence of a Normal Map-based Prox-SGD Method under the KL
Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis.
We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$.
Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z) - Minimax Optimal Regression over Sobolev Spaces via Laplacian
Regularization on Neighborhood Graphs [25.597646488273558]
We study the statistical properties of Laplacian smoothing, a graph-based approach to nonparametric regression.
We prove that Laplacian smoothing is manifold-adaptive.
arXiv Detail & Related papers (2021-06-03T01:20:41Z) - Optimal Combination of Linear and Spectral Estimators for Generalized
Linear Models [59.015960528781115]
We show how to optimally combine $hatboldsymbol xrm L$ and $hatboldsymbol xrm s$.
In order to establish the limiting distribution of $(boldsymbol x, hatboldsymbol xrm L, hatboldsymbol xrm s)$, we design and analyze an Approximate Message Passing (AMP) algorithm.
arXiv Detail & Related papers (2020-08-07T18:20:05Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [86.92205445270427]
We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
arXiv Detail & Related papers (2019-06-02T03:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.