SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm
- URL: http://arxiv.org/abs/2108.10411v1
- Date: Mon, 23 Aug 2021 21:03:09 GMT
- Title: SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm
- Authors: Andreas Oslandsbotn, Zeljko Kereta, Valeriya Naumova, Yoav Freund,
Alexander Cloninger
- Abstract summary: Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
- Score: 60.61943386819384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kernel ridge regression (KRR) is a popular scheme for non-linear
non-parametric learning. However, existing implementations of KRR require that
all the data is stored in the main memory, which severely limits the use of KRR
in contexts where data size far exceeds the memory size. Such applications are
increasingly common in data mining, bioinformatics, and control. A powerful
paradigm for computing on data sets that are too large for memory is the
streaming model of computation, where we process one data sample at a time,
discarding each sample before moving on to the next one.
In this paper, we propose StreaMRAK - a streaming version of KRR. StreaMRAK
improves on existing KRR schemes by dividing the problem into several levels of
resolution, which allows continual refinement to the predictions. The algorithm
reduces the memory requirement by continuously and efficiently integrating new
samples into the training model. With a novel sub-sampling scheme, StreaMRAK
reduces memory and computational complexities by creating a sketch of the
original data, where the sub-sampling density is adapted to the bandwidth of
the kernel and the local dimensionality of the data.
We present a showcase study on two synthetic problems and the prediction of
the trajectory of a double pendulum. The results show that the proposed
algorithm is fast and accurate.
Related papers
- Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Angular upsampling in diffusion MRI using contextual HemiHex
sub-sampling in q-space [0.0]
It is important to incorporate relevant context for the data to ensure that maximum prior information is provided for the AI model to infer the posterior.
In this paper, we introduce HemiHex subsampling to suggestively address training data sampling on q-space geometry.
Our proposed approach is a geometrically optimized regression technique which infers the unknown q-space thus addressing the limitations in the earlier studies.
arXiv Detail & Related papers (2022-11-01T03:13:07Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it.
Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z) - Memory and Computation-Efficient Kernel SVM via Binary Embedding and
Ternary Model Coefficients [18.52747917850984]
Kernel approximation is widely used to scale up kernel SVM training and prediction.
Memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices.
We propose a novel memory and computation-efficient kernel SVM model by using both binary embedding and binary model coefficients.
arXiv Detail & Related papers (2020-10-06T09:41:54Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.