Tanimoto Random Features for Scalable Molecular Machine Learning
- URL: http://arxiv.org/abs/2306.14809v2
- Date: Mon, 13 Nov 2023 21:56:13 GMT
- Title: Tanimoto Random Features for Scalable Molecular Machine Learning
- Authors: Austin Tripp, Sergio Bacallado, Sukriti Singh, Jos\'e Miguel
Hern\'andez-Lobato
- Abstract summary: We propose two kinds of novel random features to allow the Tanimoto kernel to scale to large datasets.
We show that these random features are effective at approximating the Tanimoto coefficient of real-world datasets.
- Score: 2.5112706394511766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Tanimoto coefficient is commonly used to measure the similarity between
molecules represented as discrete fingerprints, either as a distance metric or
a positive definite kernel. While many kernel methods can be accelerated using
random feature approximations, at present there is a lack of such
approximations for the Tanimoto kernel. In this paper we propose two kinds of
novel random features to allow this kernel to scale to large datasets, and in
the process discover a novel extension of the kernel to real-valued vectors. We
theoretically characterize these random features, and provide error bounds on
the spectral norm of the Gram matrix. Experimentally, we show that these random
features are effective at approximating the Tanimoto coefficient of real-world
datasets and are useful for molecular property prediction and optimization
tasks.
Related papers
- Variance-Reducing Couplings for Random Features: Perspectives from Optimal Transport [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning, replacing exact kernel evaluations with Monte Carlo estimates.
We tackle this through the unifying framework of optimal transport, using theoretical insights and numerical algorithms to develop novel, high-performing RF couplings for kernels defined on Euclidean and discrete input spaces.
We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z) - Orthogonal Random Features: Explicit Forms and Sharp Inequalities [6.889425872704067]
We analyze the bias and the variance of the kernel approximation based on orthogonal random features.
We provide explicit expressions for these quantities using normalized Bessel bounds and derive sharp exponential.
arXiv Detail & Related papers (2023-10-11T10:40:43Z) - General Graph Random Features [42.75616308187867]
We propose a novel random walk-based algorithm for unbiased estimation of arbitrary functions of a weighted adjacency matrix.
Our algorithm enjoys subquadratic time complexity with respect to the number of nodes, overcoming the notoriously prohibitive cubic scaling of exact graph kernel evaluation.
arXiv Detail & Related papers (2023-10-07T15:47:31Z) - Simplex Random Features [53.97976744884616]
We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels.
We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels.
We show consistent gains provided by SimRFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers.
arXiv Detail & Related papers (2023-01-31T18:53:39Z) - Local Random Feature Approximations of the Gaussian Kernel [14.230653042112834]
We focus on the popular Gaussian kernel and on techniques to linearize kernel-based models by means of random feature approximations.
We show that such approaches yield poor results when modelling high-frequency data, and we propose a novel localization scheme that improves kernel approximations and downstream performance significantly.
arXiv Detail & Related papers (2022-04-12T09:52:36Z) - Improved Random Features for Dot Product Kernels [12.321353062415701]
We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels.
We show empirically that the use of complex features can significantly reduce the variances of these approximations.
We develop a data-driven optimization approach to improve random feature approximations for general dot product kernels.
arXiv Detail & Related papers (2022-01-21T14:16:56Z) - Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs)
HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Function Approximation via Sparse Random Features [23.325877475827337]
This paper introduces the sparse random feature method that learns parsimonious random feature models utilizing techniques from compressive sensing.
We show that the sparse random feature method outperforms shallow networks for well-structured functions and applications to scientific machine learning tasks.
arXiv Detail & Related papers (2021-03-04T17:53:54Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.