Clustering, multicollinearity, and singular vectors
- URL: http://arxiv.org/abs/2008.03368v1
- Date: Fri, 7 Aug 2020 20:32:34 GMT
- Title: Clustering, multicollinearity, and singular vectors
- Authors: Hamid Usefi
- Abstract summary: Let $A$ be a matrix with its pseudo-matrix $Adagger$ and set $S=I-AdaggerA$.
We prove that, after re-ordering the columns of $A$, the matrix $S$ has a block-diagonal form where each block corresponds to a set of linearly dependent columns.
- Score: 2.055949720959582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Let $A$ be a matrix with its pseudo-matrix $A^{\dagger}$ and set
$S=I-A^{\dagger}A$. We prove that, after re-ordering the columns of $A$, the
matrix $S$ has a block-diagonal form where each block corresponds to a set of
linearly dependent columns. This allows us to identify redundant columns in
$A$. We explore some applications in supervised and unsupervised learning,
specially feature selection, clustering, and sensitivity of solutions of least
squares solutions.
Related papers
- Polynomial Algorithms for Simultaneous Unitary Similarity and Equivalence [0.0]
We present an algorithm to solve the Simultaneous Unitary Similarity(S.U.S) problem.<n>The algorithms have a complexity of $O(pn4)$.<n>This work finds application in Quantum Evolution, Quantum gate design and Canonical Simulation.
arXiv Detail & Related papers (2025-11-08T09:24:59Z) - Equality condition for a matrix inequality by partial transpose [4.115763274264731]
We study the equality condition for a matrix inequality generated by partial transpose.<n>We show that $sumK_j=1 A_j otimes B_j$ is locally equivalent to an elegant block-diagonal form.
arXiv Detail & Related papers (2025-08-26T03:42:46Z) - LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions [54.54897832889028]
We show that for any $K$, there is a universal set" $U subset [n]$ of size independent of $n$, such that for any $Q$ and any row $i$, the large attention scores $A_i,j$ in row $i$ of $A$ all have $jin U$.
We empirically show the benefits of our scheme for vision transformers, showing how to train new models that use our universal set while training as well.
arXiv Detail & Related papers (2024-10-07T19:47:13Z) - Efficient Matrix Factorization Via Householder Reflections [2.3326951882644553]
We show that the exact recovery of the factors $mathbfH$ and $mathbfX$ from $mathbfY$ is guaranteed with $Omega$ columns in $mathbfY$.
We hope the techniques in this work help in developing alternate algorithms for dictionary learning.
arXiv Detail & Related papers (2024-05-13T11:13:49Z) - One-sided Matrix Completion from Two Observations Per Row [95.87811229292056]
We propose a natural algorithm that involves imputing the missing values of the matrix $XTX$.
We evaluate our algorithm on one-sided recovery of synthetic data and low-coverage genome sequencing.
arXiv Detail & Related papers (2023-06-06T22:35:16Z) - Leverage Score Sampling for Tensor Product Matrices in Input Sparsity
Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices.
Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - Farthest sampling segmentation of triangulated surfaces [0.0]
Farthest Sampling (FSS) is a new method for segmentation of triangulated surfaces.
The FSS method does not depend on parameters that must be tuned by hand and it is very flexible.
Numerical experiments with several metrics and a large variety of 3D triangular meshes show that the segmentations obtained computing less than the 10% of columns $W$ are as good as those obtained from clustering the rows of the full matrix $W$.
arXiv Detail & Related papers (2020-12-01T13:31:44Z) - Vector-Matrix-Vector Queries for Solving Linear Algebra, Statistics, and
Graph Problems [58.83118651518438]
We consider the general problem of learning about a matrix through vector-matrix-vector queries.
These queries provide the value of $boldsymbolumathrmTboldsymbolMboldsymbolv$ over a fixed field.
We provide new upper and lower bounds for a wide variety of problems, spanning linear algebra, statistics, and graphs.
arXiv Detail & Related papers (2020-06-24T19:33:49Z) - Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss [76.02734481158458]
It is known that in the worst case, to obtain a good rank-$k$ approximation to a matrix, one needs an arbitrarily large $nOmega(1)$ number of columns.
We show that under certain minimal and realistic distributional settings, it is possible to obtain a $(k/epsilon)$-approximation with a nearly linear running time and poly$(k/epsilon)+O(klog n)$ columns.
This is the first algorithm of any kind for achieving a $(k/epsilon)$-approximation for entrywise
arXiv Detail & Related papers (2020-04-16T22:57:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.