Memory and Computation-Efficient Kernel SVM via Binary Embedding and
Ternary Model Coefficients
- URL: http://arxiv.org/abs/2010.02577v1
- Date: Tue, 6 Oct 2020 09:41:54 GMT
- Title: Memory and Computation-Efficient Kernel SVM via Binary Embedding and
Ternary Model Coefficients
- Authors: Zijian Lei, Liang Lan
- Abstract summary: Kernel approximation is widely used to scale up kernel SVM training and prediction.
Memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices.
We propose a novel memory and computation-efficient kernel SVM model by using both binary embedding and binary model coefficients.
- Score: 18.52747917850984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel approximation is widely used to scale up kernel SVM training and
prediction. However, the memory and computation costs of kernel approximation
models are still too high if we want to deploy them on memory-limited devices
such as mobile phones, smartwatches, and IoT devices. To address this
challenge, we propose a novel memory and computation-efficient kernel SVM model
by using both binary embedding and binary model coefficients. First, we propose
an efficient way to generate compact binary embedding of the data, preserving
the kernel similarity. Second, we propose a simple but effective algorithm to
learn a linear classification model with ternary coefficients that can support
different types of loss function and regularizer. Our algorithm can achieve
better generalization accuracy than existing works on learning binary
coefficients since we allow coefficient to be $-1$, $0$, or $1$ during the
training stage, and coefficient $0$ can be removed during model inference for
binary classification. Moreover, we provide a detailed analysis of the
convergence of our algorithm and the inference complexity of our model. The
analysis shows that the convergence to a local optimum is guaranteed, and the
inference complexity of our model is much lower than other competing methods.
Our experimental results on five large real-world datasets have demonstrated
that our proposed method can build accurate nonlinear SVM models with memory
costs less than 30KB.
Related papers
- Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.
As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Robust kernel-free quadratic surface twin support vector machine with capped $L_1$-norm distance metric [0.46040036610482665]
This paper proposes a robust capped L_norm kernel-free surface twin support vector machine (CL_QTSVM)
The robustness of our model is further improved by employing the capped L_norm distance metric.
An iterative algorithm is developed to efficiently solve the proposed model.
arXiv Detail & Related papers (2024-05-27T09:23:52Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - A Brain-Inspired Low-Dimensional Computing Classifier for Inference on
Tiny Devices [17.976792694929063]
We propose a low-dimensional computing (LDC) alternative to hyperdimensional computing (HDC)
We map our LDC classifier into a neural equivalent network and optimize our model using a principled training approach.
Our LDC classifier offers an overwhelming advantage over the existing brain-inspired HDC models and is particularly suitable for inference on tiny devices.
arXiv Detail & Related papers (2022-03-09T17:20:12Z) - Tensor Network Kalman Filtering for Large-Scale LS-SVMs [17.36231167296782]
Least squares support vector machines are used for nonlinear regression and classification.
A framework based on tensor networks and the Kalman filter is presented to alleviate the demanding memory and computational complexities.
Results show that our method can achieve high performance and is particularly useful when alternative methods are computationally infeasible.
arXiv Detail & Related papers (2021-10-26T08:54:03Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Binarization Methods for Motor-Imagery Brain-Computer Interface
Classification [18.722731794073756]
We propose methods for transforming real-valued weights to binary numbers for efficient inference.
By tuning the dimension of the binary embedding, we achieve almost the same accuracy in 4-class MI ($leq$1.27% lower) compared to models with float16 weights.
Our method replaces the fully connected layer of CNNs with a binary augmented memory using bipolar random projection.
arXiv Detail & Related papers (2020-10-14T12:28:18Z) - Scaling Distributed Deep Learning Workloads beyond the Memory Capacity
with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods.
We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods.
Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z) - On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points.
We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.