Compressing Heavy-Tailed Weight Matrices for Non-Vacuous Generalization
Bounds
- URL: http://arxiv.org/abs/2105.11025v1
- Date: Sun, 23 May 2021 21:36:33 GMT
- Title: Compressing Heavy-Tailed Weight Matrices for Non-Vacuous Generalization
Bounds
- Authors: John Y. Shin
- Abstract summary: The compression framework of arXiv:1802.05296 is utilized to show that matrices with heavy-tail distributed matrix elements can be compressed.
The action of these matrices on a vector is discussed, and how they may relate to compression and resilient classification is analyzed.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Heavy-tailed distributions have been studied in statistics, random matrix
theory, physics, and econometrics as models of correlated systems, among other
domains. Further, heavy-tail distributed eigenvalues of the covariance matrix
of the weight matrices in neural networks have been shown to empirically
correlate with test set accuracy in several works (e.g. arXiv:1901.08276), but
a formal relationship between heavy-tail distributed parameters and
generalization bounds was yet to be demonstrated. In this work, the compression
framework of arXiv:1802.05296 is utilized to show that matrices with heavy-tail
distributed matrix elements can be compressed, resulting in networks with
sparse weight matrices. Since the parameter count has been reduced to a sum of
the non-zero elements of sparse matrices, the compression framework allows us
to bound the generalization gap of the resulting compressed network with a
non-vacuous generalization bound. Further, the action of these matrices on a
vector is discussed, and how they may relate to compression and resilient
classification is analyzed.
Related papers
- Spectral distribution of sparse Gaussian Ensembles of Real Asymmetric Matrices [0.0]
We analyze the spectral statistics of the multiparametric Gaussian ensembles of real asymmetric matrices.<n>Our formulation provides a common mathematical formulation of the spectral statistics for a wide range of sparse real-asymmetric ensembles.
arXiv Detail & Related papers (2025-07-28T17:13:44Z) - Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy [57.54306942529943]
We propose an Approximately Orthogonal Fine-Tuning (AOFT) strategy for representing the low-rank weight matrices.<n>Our method achieves competitive performance across a range of downstream image classification tasks.
arXiv Detail & Related papers (2025-07-17T16:09:05Z) - Spectral Estimation with Free Decompression [47.81955761814048]
We introduce a novel method of "free decompression" to estimate the spectrum of very large (impalpable) matrices.<n>Our method can be used to extrapolate from the empirical spectral densities of small submatrices to infer the eigenspectrum of extremely large (impalpable) matrices.
arXiv Detail & Related papers (2025-06-13T17:49:25Z) - Perturbation Analysis of Singular Values in Concatenated Matrices [0.0]
How does the singular value spectrum and singular perturbationd matrix relate to the spectra of its individual components?<n>We setup analytical bounds that quantify stability of values under small perturbations in submatrices.<n>Results demonstrate that if submatrices are close in a norm, dominant singular values of the singular matrix remain stable enabling controlled trade-offs between accuracy and compression.
arXiv Detail & Related papers (2025-03-11T09:28:57Z) - Wrapped Gaussian on the manifold of Symmetric Positive Definite Matrices [6.7523635840772505]
Circular and non-flat data distributions are prevalent across diverse domains of data science.
A principled approach to accounting for the underlying geometry of such data is pivotal.
This work lays the groundwork for extending classical machine learning and statistical methods to more complex and structured data.
arXiv Detail & Related papers (2025-02-03T16:46:46Z) - Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry [63.694184882697435]
Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations.
arXiv Detail & Related papers (2024-07-15T07:11:44Z) - Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition.
A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues.
We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z) - Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation.
Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z) - Implicit Regularization of Gradient Flow on One-Layer Softmax Attention [10.060496091806694]
We study gradient flow on the exponential loss for a classification problem with a one-layer softmax attention model.
Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices.
arXiv Detail & Related papers (2024-03-13T17:02:27Z) - Quantitative deterministic equivalent of sample covariance matrices with
a general dependence structure [0.0]
We prove quantitative bounds involving both the dimensions and the spectral parameter, in particular allowing it to get closer to the real positive semi-line.
As applications, we obtain a new bound for the convergence in Kolmogorov distance of the empirical spectral distributions of these general models.
arXiv Detail & Related papers (2022-11-23T15:50:31Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Riemannian statistics meets random matrix theory: towards learning from
high-dimensional covariance matrices [2.352645870795664]
This paper shows that there seems to exist no practical method of computing the normalising factors associated with Riemannian Gaussian distributions on spaces of high-dimensional covariance matrices.
It is shown that this missing method comes from an unexpected new connection with random matrix theory.
Numerical experiments are conducted which demonstrate how this new approximation can unlock the difficulties which have impeded applications to real-world datasets.
arXiv Detail & Related papers (2022-03-01T03:16:50Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - A simpler spectral approach for clustering in directed networks [1.52292571922932]
We show that using the eigenvalue/eigenvector decomposition of the adjacency matrix is simpler than all common methods.
We provide numerical evidence for the superiority of the Gaussian Mixture clustering over the widely used k-means algorithm.
arXiv Detail & Related papers (2021-02-05T14:16:45Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Squashed Shifted PMI Matrix: Bridging Word Embeddings and Hyperbolic
Spaces [7.855758501691244]
We show that removing sigmoid transformation in the skip-gram with negative sampling does not harm the quality of word vectors significantly.
We also factorize a squashed shifted PMI matrix which, in turn, can be treated as a connection probabilities matrix of a random graph.
arXiv Detail & Related papers (2020-02-27T09:50:41Z) - Relative Error Bound Analysis for Nuclear Norm Regularized Matrix Completion [101.83262280224729]
We develop a relative error bound for nuclear norm regularized matrix completion.
We derive a relative upper bound for recovering the best low-rank approximation of the unknown matrix.
arXiv Detail & Related papers (2015-04-26T13:12:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.