Chefs' Random Tables: Non-Trigonometric Random Features
- URL: http://arxiv.org/abs/2205.15317v1
- Date: Mon, 30 May 2022 11:37:21 GMT
- Title: Chefs' Random Tables: Non-Trigonometric Random Features
- Authors: Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick
Liu, Tamas Sarlos, Adrian Weller
- Abstract summary: We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels.
CRTs are an alternative to standard random kitchen sink (RKS) methods, which inherently rely on the trigonometric maps.
We present variants of CRTs where RFs are positive, a key requirement for applications in recent low-rank Transformers.
- Score: 39.282051468586666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce chefs' random tables (CRTs), a new class of non-trigonometric
random features (RFs) to approximate Gaussian and softmax kernels. CRTs are an
alternative to standard random kitchen sink (RKS) methods, which inherently
rely on the trigonometric maps. We present variants of CRTs where RFs are
positive, a key requirement for applications in recent low-rank Transformers.
Further variance reduction is possible by leveraging statistics which are
simple to compute. One instantiation of CRTs, the optimal positive random
features (OPRFs), is to our knowledge the first RF method for unbiased softmax
kernel estimation with positive and bounded RFs, resulting in exponentially
small tails and much lower variance than its counterparts. As we show,
orthogonal random features applied in OPRFs provide additional variance
reduction for any dimensionality $d$ (not only asymptotically for sufficiently
large $d$, as for RKS). We test CRTs on many tasks ranging from non-parametric
classification to training Transformers for text, speech and image data,
obtaining new state-of-the-art results for low-rank text Transformers, while
providing linear space and time complexity.
Related papers
- Adaptive Random Fourier Features Training Stabilized By Resampling With Applications in Image Regression [0.8947831206263182]
We present an enhanced adaptive random Fourier features (ARFF) training algorithm for shallow neural networks.
This method uses a particle filter type resampling technique to stabilize the training process and reduce sensitivity to parameter choices.
arXiv Detail & Related papers (2024-10-08T22:08:03Z) - Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Neural Fields with Thermal Activations for Arbitrary-Scale Super-Resolution [56.089473862929886]
We present a novel way to design neural fields such that points can be queried with an adaptive Gaussian PSF.
With its theoretically guaranteed anti-aliasing, our method sets a new state of the art for arbitrary-scale single image super-resolution.
arXiv Detail & Related papers (2023-11-29T14:01:28Z) - FAVOR#: Sharp Attention Kernel Approximations via New Classes of
Positive Random Features [39.282051468586666]
We propose parameterized, positive, non-trigonometric RFs which approximate Gaussian and softmax- Kernels.
We show that our methods lead to variance reduction in practice and outperform previous methods in a kernel regression task.
We also present FAVOR#, a method for self-attention approximation in Transformers.
arXiv Detail & Related papers (2023-02-01T22:43:29Z) - Simplex Random Features [53.97976744884616]
We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels.
We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels.
We show consistent gains provided by SimRFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers.
arXiv Detail & Related papers (2023-01-31T18:53:39Z) - The Terminating-Random Experiments Selector: Fast High-Dimensional
Variable Selection with False Discovery Rate Control [10.86851797584794]
T-Rex selector controls a user-defined target false discovery rate (FDR)
Experiments are conducted on a combination of the original predictors and multiple sets of randomly generated dummy predictors.
arXiv Detail & Related papers (2021-10-12T14:52:46Z) - Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs)
HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Towards Unbiased Random Features with Lower Variance For Stationary
Indefinite Kernels [26.57122949130266]
Our algorithm achieves lower variance and approximation error compared with the existing kernel approximation methods.
With better approximation to the originally selected kernels, improved classification accuracy and regression ability is obtained.
arXiv Detail & Related papers (2021-04-13T13:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.