Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations
- URL: http://arxiv.org/abs/2405.00837v1
- Date: Wed, 1 May 2024 19:56:52 GMT
- Title: Locality Regularized Reconstruction: Structured Sparsity and Delaunay Triangulations
- Authors: Marshall Mueller, James M. Murphy, Abiy Tasissa,
- Abstract summary: Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction.
In this work we seek $mathbfw$ that forms a local reconstruction of $mathbfy$ by solving a regularized least squares regression problem.
We prove that, for all levels of regularization and under a mild condition that the columns of $mathbfX$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$.
- Score: 7.148312060227714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. Given a set of points $[\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n] = \mathbf{X} \in \mathbb{R}^{d \times n}$ and a vector $\mathbf{y} \in \mathbb{R}^d$, the goal is to find coefficients $\mathbf{w} \in \mathbb{R}^n$ so that $\mathbf{X} \mathbf{w} \approx \mathbf{y}$, subject to some desired structure on $\mathbf{w}$. In this work we seek $\mathbf{w}$ that forms a local reconstruction of $\mathbf{y}$ by solving a regularized least squares regression problem. We obtain local solutions through a locality function that promotes the use of columns of $\mathbf{X}$ that are close to $\mathbf{y}$ when used as a regularization term. We prove that, for all levels of regularization and under a mild condition that the columns of $\mathbf{X}$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$, thereby providing local sparse solutions when $d \ll n$. Under the same condition we also show that for any $\mathbf{y}$ contained in the convex hull of $\mathbf{X}$ there exists a regime of regularization parameter such that the optimal coefficients are supported on the vertices of the Delaunay simplex containing $\mathbf{y}$. This provides an interpretation of the sparsity as having structure obtained implicitly from the Delaunay triangulation of $\mathbf{X}$. We demonstrate that our locality regularized problem can be solved in comparable time to other methods that identify the containing Delaunay simplex.
Related papers
- Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise [38.551072383777594]
We study the problem of learning a single neuron with respect to the $L2$ loss in the presence of adversarial distribution shifts.
A new algorithm is developed to approximate the vector vector squared loss with respect to the worst distribution that is in the $chi2$divergence to the $mathcalp_0$.
arXiv Detail & Related papers (2024-11-11T03:43:52Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Compressing Large Language Models using Low Rank and Low Precision Decomposition [46.30918750022739]
This work introduces $rm CALDERA$ -- a new post-training LLM compression algorithm.
It harnesses the inherent low-rank structure of a weight matrix $mathbfW$ by approximating it via a low-rank, low-precision decomposition.
Results show that compressing LlaMa-$2$ $7$B/$13B$/$70$B and LlaMa-$3$ $8$B models using $rm CALDERA$ outperforms existing post-training compression techniques.
arXiv Detail & Related papers (2024-05-29T08:42:30Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Fast Partition-Based Cross-Validation With Centering and Scaling for $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ [0.0]
We present algorithms that substantially accelerate partition-based cross-validation for machine learning models.
Our algorithms have applications in model selection for, e.g., principal component analysis (PCA), principal component regression (PCR), ridge regression (RR), ordinary least squares (OLS) and partial least squares (PLS)
Unlike alternatives found in the literature, we avoid data leakage due to preprocessing.
arXiv Detail & Related papers (2024-01-24T02:16:03Z) - Quantum Oblivious LWE Sampling and Insecurity of Standard Model Lattice-Based SNARKs [4.130591018565202]
The Learning Errors With Errors ($mathsfLWE$) problem asks to find $mathbfs$ from an input of the form $(mathbfAmathbfs+mathbfe$)
We do not focus on solving $mathsfLWE$ but on the task of sampling instances.
Our main result is a quantum-time algorithm that samples well-distributed $mathsfLWE$ instances while provably not knowing the solution.
arXiv Detail & Related papers (2024-01-08T10:55:41Z) - Optimal Estimator for Linear Regression with Shuffled Labels [17.99906229036223]
This paper considers the task of linear regression with shuffled labels.
$mathbf Y in mathbb Rntimes m, mathbf Pi in mathbb Rntimes p, mathbf B in mathbb Rptimes m$, and $mathbf Win mathbb Rntimes m$, respectively.
arXiv Detail & Related papers (2023-10-02T16:44:47Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets.
Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z) - On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [86.92205445270427]
We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
arXiv Detail & Related papers (2019-06-02T03:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.