Position: Curvature Matrices Should Be Democratized via Linear Operators
- URL: http://arxiv.org/abs/2501.19183v1
- Date: Fri, 31 Jan 2025 14:46:30 GMT
- Title: Position: Curvature Matrices Should Be Democratized via Linear Operators
- Authors: Felix Dangel, Runa Eschenhagen, Weronika Ormaniec, Andres Fernandez, Lukas Tatzel, Agustinus Kristiadi,
- Abstract summary: Linear operators provide a general, scalable, and user-friendly abstraction to handle curvature matrices.<n>$textitcurvlinops$ is a library that provides curvature matrices through a unified linear operator interface.<n>We demonstrate with $textitcurvlinops$ how this interface can hide complexity, simplify applications, be interoperable with other libraries, and scale to large NNs.
- Score: 6.946287154076936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured large matrices are prevalent in machine learning. A particularly important class is curvature matrices like the Hessian, which are central to understanding the loss landscape of neural nets (NNs), and enable second-order optimization, uncertainty quantification, model pruning, data attribution, and more. However, curvature computations can be challenging due to the complexity of automatic differentiation, and the variety and structural assumptions of curvature proxies, like sparsity and Kronecker factorization. In this position paper, we argue that linear operators -- an interface for performing matrix-vector products -- provide a general, scalable, and user-friendly abstraction to handle curvature matrices. To support this position, we developed $\textit{curvlinops}$, a library that provides curvature matrices through a unified linear operator interface. We demonstrate with $\textit{curvlinops}$ how this interface can hide complexity, simplify applications, be extensible and interoperable with other libraries, and scale to large NNs.
Related papers
- Fast Homomorphic Linear Algebra with BLAS [11.269481520748839]
Homomorphic encryption opens a wide range of applications in privacy-preserving data manipulation, notably in AI.
Many of those applications require significant linear algebra computations (matrix x vector products, and matrix x matrix products).
This central role of linear algebra computations goes far beyond homomorphic algebra and applies to most areas of scientific computing.
We show that the efficiency loss between CKKS-based encrypted square matrix multiplication and double-precision floating-point square matrix multiplication is a mere 4-12 factor, depending on the precise situation.
arXiv Detail & Related papers (2025-03-20T12:19:47Z) - Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention.
We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS.
We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z) - Projected Tensor-Tensor Products for Efficient Computation of Optimal Multiway Data Representations [0.0]
We propose a new projected tensor-tensor product that relaxes the invertibility restriction to reduce computational overhead.
We provide theory to prove the matrix mimeticity and the optimality of compressed representations within the projected product framework.
arXiv Detail & Related papers (2024-09-28T16:29:54Z) - Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry [63.694184882697435]
Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations.
This paper provides a comprehensive and unified understanding of the matrix logarithm and power from a Riemannian geometry perspective.
arXiv Detail & Related papers (2024-07-15T07:11:44Z) - Optimal Matrix-Mimetic Tensor Algebras via Variable Projection [0.0]
Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices.
We learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data.
We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm.
arXiv Detail & Related papers (2024-06-11T04:52:23Z) - CoLA: Exploiting Compositional Structure for Automatic and Efficient
Numerical Linear Algebra [62.37017125812101]
We propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA.
By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms.
We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.
arXiv Detail & Related papers (2023-09-06T14:59:38Z) - Sufficient dimension reduction for feature matrices [3.04585143845864]
We propose a method called principal support matrix machine (PSMM) for the matrix sufficient dimension reduction.
Our numerical analysis demonstrates that the PSMM outperforms existing methods and has strong interpretability in real data applications.
arXiv Detail & Related papers (2023-03-07T23:16:46Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Tensor Relational Algebra for Machine Learning System Design [7.764107702934616]
We present an alternative implementation abstraction called the relational tensor algebra (TRA)
TRA is a set-based algebra based on the relational algebra.
Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML in distributed clusters.
arXiv Detail & Related papers (2020-09-01T15:51:24Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z) - Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself.
We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.