Simplex Random Features
- URL: http://arxiv.org/abs/2301.13856v2
- Date: Sat, 7 Oct 2023 15:55:57 GMT
- Title: Simplex Random Features
- Authors: Isaac Reid, Krzysztof Choromanski, Valerii Likhosherstov, Adrian
Weller
- Abstract summary: We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels.
We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels.
We show consistent gains provided by SimRFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers.
- Score: 53.97976744884616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Simplex Random Features (SimRFs), a new random feature (RF)
mechanism for unbiased approximation of the softmax and Gaussian kernels by
geometrical correlation of random projection vectors. We prove that SimRFs
provide the smallest possible mean square error (MSE) on unbiased estimates of
these kernels among the class of weight-independent geometrically-coupled
positive random feature (PRF) mechanisms, substantially outperforming the
previously most accurate Orthogonal Random Features at no observable extra
cost. We present a more computationally expensive SimRFs+ variant, which we
prove is asymptotically optimal in the broader family of weight-dependent
geometrical coupling schemes (which permit correlations between random vector
directions and norms). In extensive empirical studies, we show consistent gains
provided by SimRFs in settings including pointwise kernel estimation,
nonparametric classification and scalable Transformers.
Related papers
- von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator [9.92828543462075]
nonlinear functions often incur various hardware and compute overheads.
computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity.
This paper proposes a first-of-its-kind universal universal-radix finite-state machine (SMURF)
Experiments demonstrate the superiority of SMURF, requiring only 16.07% area and 14.45% power consumption of Taylor-series approximation.
arXiv Detail & Related papers (2024-05-03T02:53:32Z) - Error Bounds for Learning with Vector-Valued Random Features [2.375038919274297]
This paper provides a comprehensive error analysis of learning with vector-valued random features (RF)
The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting.
arXiv Detail & Related papers (2023-05-26T18:00:08Z) - Chefs' Random Tables: Non-Trigonometric Random Features [39.282051468586666]
We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels.
CRTs are an alternative to standard random kitchen sink (RKS) methods, which inherently rely on the trigonometric maps.
We present variants of CRTs where RFs are positive, a key requirement for applications in recent low-rank Transformers.
arXiv Detail & Related papers (2022-05-30T11:37:21Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs)
HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z) - Sinkhorn Natural Gradient for Generative Models [125.89871274202439]
We propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence.
We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically.
In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.
arXiv Detail & Related papers (2020-11-09T02:51:17Z) - A Dynamical Mean-Field Theory for Learning in Restricted Boltzmann
Machines [2.8021833233819486]
We define a message-passing algorithm for computing magnetizations in Boltzmann machines.
We prove the global convergence of the algorithm under a stability criterion and compute convergence rates showing excellent agreement with numerical simulations.
arXiv Detail & Related papers (2020-05-04T15:19:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.