Improved Random Features for Dot Product Kernels
- URL: http://arxiv.org/abs/2201.08712v4
- Date: Tue, 13 Aug 2024 14:22:10 GMT
- Title: Improved Random Features for Dot Product Kernels
- Authors: Jonas Wacker, Motonobu Kanagawa, Maurizio Filippone,
- Abstract summary: We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels.
We show empirically that the use of complex features can significantly reduce the variances of these approximations.
We develop a data-driven optimization approach to improve random feature approximations for general dot product kernels.
- Score: 12.321353062415701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dot product kernels, such as polynomial and exponential (softmax) kernels, are among the most widely used kernels in machine learning, as they enable modeling the interactions between input features, which is crucial in applications like computer vision, natural language processing, and recommender systems. We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels, to make these kernels more useful in large scale learning. First, we present a generalization of existing random feature approximations for polynomial kernels, such as Rademacher and Gaussian sketches and TensorSRHT, using complex-valued random features. We show empirically that the use of complex features can significantly reduce the variances of these approximations. Second, we provide a theoretical analysis for understanding the factors affecting the efficiency of various random feature approximations, by deriving closed-form expressions for their variances. These variance formulas elucidate conditions under which certain approximations (e.g., TensorSRHT) achieve lower variances than others (e.g., Rademacher sketches), and conditions under which the use of complex features leads to lower variances than real features. Third, by using these variance formulas, which can be evaluated in practice, we develop a data-driven optimization approach to improve random feature approximations for general dot product kernels, which is also applicable to the Gaussian kernel. We describe the improvements brought by these contributions with extensive experiments on a variety of tasks and datasets.
Related papers
- Optimal Kernel Choice for Score Function-based Causal Discovery [92.65034439889872]
We propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data.
We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms kernel selection methods.
arXiv Detail & Related papers (2024-07-14T09:32:20Z) - Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning.
We find couplings to improve RFs defined on both Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Approximation properties relative to continuous scale space for hybrid discretizations of Gaussian derivative operators [0.5439020425819]
This paper presents an analysis of properties of two hybrid discretization methods for Gaussian derivatives.
The motivation for studying these discretization methods is that in situations when multiple spatial derivatives of different order are needed at the same scale level, they can be computed significantly more efficiently.
arXiv Detail & Related papers (2024-05-08T14:44:34Z) - Posterior Contraction Rates for Mat\'ern Gaussian Processes on
Riemannian Manifolds [51.68005047958965]
We show that intrinsic Gaussian processes can achieve better performance in practice.
Our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency.
arXiv Detail & Related papers (2023-09-19T20:30:58Z) - Tanimoto Random Features for Scalable Molecular Machine Learning [2.5112706394511766]
We propose two kinds of novel random features to allow the Tanimoto kernel to scale to large datasets.
We show that these random features are effective at approximating the Tanimoto coefficient of real-world datasets.
arXiv Detail & Related papers (2023-06-26T16:11:11Z) - Local Random Feature Approximations of the Gaussian Kernel [14.230653042112834]
We focus on the popular Gaussian kernel and on techniques to linearize kernel-based models by means of random feature approximations.
We show that such approaches yield poor results when modelling high-frequency data, and we propose a novel localization scheme that improves kernel approximations and downstream performance significantly.
arXiv Detail & Related papers (2022-04-12T09:52:36Z) - Gaussian Processes and Statistical Decision-making in Non-Euclidean
Spaces [96.53463532832939]
We develop techniques for broadening the applicability of Gaussian processes.
We introduce a wide class of efficient approximations built from this viewpoint.
We develop a collection of Gaussian process models over non-Euclidean spaces.
arXiv Detail & Related papers (2022-02-22T01:42:57Z) - Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs)
HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Towards Unbiased Random Features with Lower Variance For Stationary
Indefinite Kernels [26.57122949130266]
Our algorithm achieves lower variance and approximation error compared with the existing kernel approximation methods.
With better approximation to the originally selected kernels, improved classification accuracy and regression ability is obtained.
arXiv Detail & Related papers (2021-04-13T13:56:50Z) - Advanced Stationary and Non-Stationary Kernel Designs for Domain-Aware
Gaussian Processes [0.0]
We propose advanced kernel designs that only allow for functions with certain desirable characteristics to be elements of the reproducing kernel Hilbert space (RKHS)
We will show the impact of advanced kernel designs on Gaussian processes using several synthetic and two scientific data sets.
arXiv Detail & Related papers (2021-02-05T22:07:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.