Related papers: IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means

IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means

URL: http://arxiv.org/abs/2312.07759v2
Date: Fri, 15 Dec 2023 21:46:10 GMT
Title: IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means
Authors: Sean Jaffe, Ambuj K. Singh, Francesco Bullo
Abstract summary: We propose an implicit, differentiable $k$-means algorithm (IDKM) which eliminates the major memory restriction of DKM. We show that IDKM achieves comparable performance to DKM with less compute time and less memory. We also use IDKM and IDKM-JFB to quantize a large neural network, Resnet18, on hardware where DKM cannot train at all.
Score: 20.13045791225961
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compressing large neural networks with minimal performance loss is crucial to enabling their deployment on edge devices. (Cho et al., 2022) proposed a weight quantization method that uses an attention-based clustering algorithm called differentiable $k$-means (DKM). Despite achieving state-of-the-art results, DKM's performance is constrained by its heavy memory dependency. We propose an implicit, differentiable $k$-means algorithm (IDKM), which eliminates the major memory restriction of DKM. Let $t$ be the number of $k$-means iterations, $m$ be the number of weight-vectors, and $b$ be the number of bits per cluster address. IDKM reduces the overall memory complexity of a single $k$-means layer from $\mathcal{O}(t \cdot m \cdot 2^b)$ to $\mathcal{O}( m \cdot 2^b)$. We also introduce a variant, IDKM with Jacobian-Free-Backpropagation (IDKM-JFB), for which the time complexity of the gradient calculation is independent of $t$ as well. We provide a proof of concept of our methods by showing that, under the same settings, IDKM achieves comparable performance to DKM with less compute time and less memory. We also use IDKM and IDKM-JFB to quantize a large neural network, Resnet18, on hardware where DKM cannot train at all.

Related papers

Distributed quantum algorithm for divergence estimation and beyond [16.651306526783564]
We propose a distributed quantum algorithm framework to compute $rm Tr(f(A)g(B))$ within an additive error $varepsilon$. This framework holds broad applicability across a range of distributed quantum computing tasks.
arXiv Detail & Related papers (2025-03-12T14:28:22Z)
Rectified Gaussian kernel multi-view k-means clustering [0.0]
We show two new variants of multi-view k-means (MVKM) algorithms to address multi-view data. The general idea is to outline the distance between $h$-th view data points $x_ih$ and $h$-th view cluster centers $a_kh$ in a different manner of centroid-based approach.
arXiv Detail & Related papers (2024-05-09T08:35:21Z)
A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM) [37.87282737463472]
We propose VecKM, a local point cloud geometry encoder that is descriptive and efficient to compute. VecKM constructs the local geometry encoding using all neighboring points, producing a more descriptive encoding. VecKM is efficient to compute and scalable to large point cloud inputs.
arXiv Detail & Related papers (2024-04-02T02:01:21Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Weak Schur sampling with logarithmic quantum memory [0.0]
We introduce a new algorithm for the task of weak Schur sampling. Our algorithm efficiently determines both the Young label which indexes the irreducible representations and the multiplicity label of the symmetric group.
arXiv Detail & Related papers (2023-09-21T10:02:46Z)
Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner. We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue. We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues. Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z)
Scalable Differentially Private Clustering via Hierarchically Separated Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee. Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z)
Consistent Structured Prediction with Max-Min Margin Markov Networks [84.60515484036239]
Max-margin methods for binary classification have been extended to the structured prediction setting under the name of max-margin Markov networks ($M3N$) We overcome such limitations by defining the learning problem in terms of a "max-min" margin formulation, naming the resulting method max-min margin Markov networks ($M4N$) Experiments on multi-class classification, ordinal regression, sequence prediction and ranking demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2020-07-02T10:48:42Z)
Improving The Performance Of The K-means Algorithm [2.28438857884398]
My thesis proposes two algorithms to speed up IKM while remaining the quality of its clustering result approximately. The first algorithm, called Divisive K-means, improves the speed of IKM by speeding up its splitting process of clusters. The second algorithm, called Parallel Two-Phase K-means (Par2PK-means), parallelizes IKM by employing the model of Two-Phase K-means.
arXiv Detail & Related papers (2020-05-10T15:09:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.