Related papers: Efficient Inference via Universal LSH Kernel

Efficient Inference via Universal LSH Kernel

URL: http://arxiv.org/abs/2106.11426v1
Date: Mon, 21 Jun 2021 22:06:32 GMT
Title: Efficient Inference via Universal LSH Kernel
Authors: Zichang Liu, Benjamin Coleman, Anshumali Shrivastava
Abstract summary: We propose mathematically provable Representer Sketch, a concise set of count arrays that can approximate the inference procedure with simple hashing computations and aggregations. Representer Sketch builds upon the popular Representer Theorem from kernel literature, hence the name. We show that Representer Sketch achieves up to 114x reduction in storage requirement and 59x reduction in complexity without any drop in accuracy.
Score: 35.22983601434134
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large machine learning models achieve unprecedented performance on various tasks and have evolved as the go-to technique. However, deploying these compute and memory hungry models on resource constraint environments poses new challenges. In this work, we propose mathematically provable Representer Sketch, a concise set of count arrays that can approximate the inference procedure with simple hashing computations and aggregations. Representer Sketch builds upon the popular Representer Theorem from kernel literature, hence the name, providing a generic fundamental alternative to the problem of efficient inference that goes beyond the popular approach such as quantization, iterative pruning and knowledge distillation. A neural network function is transformed to its weighted kernel density representation, which can be very efficiently estimated with our sketching algorithm. Empirically, we show that Representer Sketch achieves up to 114x reduction in storage requirement and 59x reduction in computation complexity without any drop in accuracy.

Related papers

Scaling Probabilistic Circuits via Monarch Matrices [109.65822339230853]
Probabilistic Circuits (PCs) are tractable representations of probability distributions.<n>We propose a novel sparse and structured parameterization for the sum blocks in PCs.
arXiv Detail & Related papers (2025-06-14T07:39:15Z)
Skelite: Compact Neural Networks for Efficient Iterative Skeletonization [38.6452617345504]
Skeletonization extracts thin representations from images that compactly encode their geometry and topology. We propose a novel framework for training iterative skeletonization algorithms with a learnable component. Our method demonstrates a 100 times speedup over topology-constrained algorithms.
arXiv Detail & Related papers (2025-03-10T14:27:25Z)
Post-Training Non-Uniform Quantization for Convolutional Neural Networks [0.0]
Quantization is a technique that aims to alleviate large storage requirements and speed up the inference process. In this paper, we introduce a novel post-training quantization method for model weights. Our method finds optimal clipping thresholds and scaling factors along with mathematical guarantees that our method minimizes quantization noise.
arXiv Detail & Related papers (2024-12-10T10:33:58Z)
Efficient Fairness-Performance Pareto Front Computation [51.558848491038916]
We show that optimal fair representations possess several useful structural properties. We then show that these approxing problems can be solved efficiently via concave programming methods.
arXiv Detail & Related papers (2024-09-26T08:46:48Z)
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization [0.6445087473595953]
Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning. deploying LLM inference poses challenges due to the high compute and memory requirements. We present Tender, an algorithm-hardware co-design solution that enables efficient deployment of LLM inference at low precision.
arXiv Detail & Related papers (2024-06-16T09:51:55Z)
Sparse Variational Student-t Processes [8.46450148172407]
Student-t Processes are used to model heavy-tailed distributions and datasets with outliers. We propose a sparse representation framework to allow Student-t Processes to be more flexible for real-world datasets. We evaluate two proposed approaches on various synthetic and real-world datasets from UCI and Kaggle.
arXiv Detail & Related papers (2023-12-09T12:55:20Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z)
NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search [7.971065005161565]
quantization is a technique to convert floating point representations to low bit-width fixed point representations. We show how to learn new quantized weights over the entire quantized space. We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations.
arXiv Detail & Related papers (2023-08-10T14:19:58Z)
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs) We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.