Related papers: Approximation of the Proximal Operator of the $\ell

Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network

URL: http://arxiv.org/abs/2408.11211v1
Date: Tue, 20 Aug 2024 22:12:30 GMT
Title: Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network
Authors: Kathryn Linehan, Radu Balan,
Abstract summary: We present an approximation of $textbfprox_alphacdot||infty(mathbfx)$ using a neural network. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process. We show that the network outperforms a "vanilla neural network" that does not use feature selection.
Score: 1.7265013728931
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Computing the proximal operator of the $\ell_\infty$ norm, $\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$, generally requires a sort of the input data, or at least a partial sort similar to quicksort. In order to avoid using a sort, we present an $O(m)$ approximation of $\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$ using a neural network. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach. We show that the network outperforms a "vanilla neural network" that does not use feature selection. We also present an algorithm with corresponding theory to calculate $\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$ exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation.

Related papers

More-efficient Quantum Multivariate Mean Value Estimator from Generalized Grover Operator [0.0]
We find an algorithm that uses $Oleft(n log fracddeltaright)$ samples to find a mean estimate that $vectildemu$.<n>Our result remains not exactly optimal due to a $log fracddelta$ term in our complexity.
arXiv Detail & Related papers (2025-04-09T14:48:23Z)
A New Rejection Sampling Approach to $k$-$\mathtt{means}$++ With Improved Trade-Offs [0.12289361708127876]
We present a simple and effective rejection sampling based approach for speeding up $k$-$mathttmeans$++. Our first method runs in time $tildeO(mathttnnz (mathcalX) + beta k2d)$ while still being $O(log k )$ competitive in expectation. Our second method presents a new trade-off between computational cost and solution quality.
arXiv Detail & Related papers (2025-02-04T08:05:34Z)
Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector. We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z)
On the Complexity of Neural Computation in Superposition [3.9803704378699103]
Recent advances in the understanding of neural networks suggest that superposition is a key mechanism underlying the computational efficiency of large-scale networks. We show a nearly tight upper bound: logical operations like pairwise AND can be computed using $O(sqrtm' log m')$ neurons and $O(m' log2 m')$ parameters. Our results open a path for using complexity theoretic techniques in neural network interpretability research.
arXiv Detail & Related papers (2024-09-05T18:58:59Z)
On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks [2.4112990554464235]
We show that the rate $mathcalO((WL)-2s/d)$ indeed holds under the Sobolev embedding condition. Key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth.
arXiv Detail & Related papers (2024-09-02T02:26:01Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives [0.0]
We consider a gradient-based optimization method applied to a function $boldsymboltheta$. This framework encompasses many common use-cases, such as training neural networks by gradient descent.
arXiv Detail & Related papers (2023-12-06T20:24:05Z)
Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets. We present an improved version of the "$q$-means" algorithm for clustering. We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z)
Optimal Approximation Rates for Deep ReLU Neural Networks on Sobolev and Besov Spaces [2.7195102129095003]
Deep neural networks with the ReLU activation function can approximate functions in the Sobolev spaces $Ws(L_q(Omega))$ and Besov spaces $Bs_r(L_q(Omega))$. This problem is important when studying the application of neural networks in a variety of fields.
arXiv Detail & Related papers (2022-11-25T23:32:26Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Distributed Saddle-Point Problems Under Similarity [173.19083235638104]
We show that a given suboptimality $epsilon0$ is achieved master/workers networks in $Omegabig. We then propose algorithms matching the lower bounds either types of networks (up to log-overs) We assess effectiveness of the proposed algorithms on a robust logistic regression problem.
arXiv Detail & Related papers (2021-07-22T14:25:16Z)
List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions. Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset. A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)
Variational Orthogonal Features [29.636483122130027]
We show that for certain priors, features can be defined that remove the $mathcalO(M3)$ cost of computing a minibatch estimate of an evidence lower bound (ELBO) We present a construction of features for any stationary prior kernel that allow for computation of an unbiased estimator to the ELBO using $T$ Monte Carlo samples in $mathcalO(tildeNT+M2T)$ and in $mathcalO(tildeNT+MT)$ with an additional approximation.
arXiv Detail & Related papers (2020-06-23T17:18:07Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.