IDKM: Memory Efficient Neural Network Quantization via Implicit,
Differentiable k-Means
- URL: http://arxiv.org/abs/2312.07759v2
- Date: Fri, 15 Dec 2023 21:46:10 GMT
- Title: IDKM: Memory Efficient Neural Network Quantization via Implicit,
Differentiable k-Means
- Authors: Sean Jaffe, Ambuj K. Singh, Francesco Bullo
- Abstract summary: We propose an implicit, differentiable $k$-means algorithm (IDKM) which eliminates the major memory restriction of DKM.
We show that IDKM achieves comparable performance to DKM with less compute time and less memory.
We also use IDKM and IDKM-JFB to quantize a large neural network, Resnet18, on hardware where DKM cannot train at all.
- Score: 20.13045791225961
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compressing large neural networks with minimal performance loss is crucial to
enabling their deployment on edge devices. (Cho et al., 2022) proposed a weight
quantization method that uses an attention-based clustering algorithm called
differentiable $k$-means (DKM). Despite achieving state-of-the-art results,
DKM's performance is constrained by its heavy memory dependency. We propose an
implicit, differentiable $k$-means algorithm (IDKM), which eliminates the major
memory restriction of DKM. Let $t$ be the number of $k$-means iterations, $m$
be the number of weight-vectors, and $b$ be the number of bits per cluster
address. IDKM reduces the overall memory complexity of a single $k$-means layer
from $\mathcal{O}(t \cdot m \cdot 2^b)$ to $\mathcal{O}( m \cdot 2^b)$. We also
introduce a variant, IDKM with Jacobian-Free-Backpropagation (IDKM-JFB), for
which the time complexity of the gradient calculation is independent of $t$ as
well. We provide a proof of concept of our methods by showing that, under the
same settings, IDKM achieves comparable performance to DKM with less compute
time and less memory. We also use IDKM and IDKM-JFB to quantize a large neural
network, Resnet18, on hardware where DKM cannot train at all.
Related papers
- Rectified Gaussian kernel multi-view k-means clustering [0.0]
We show two new variants of multi-view k-means (MVKM) algorithms to address multi-view data.
The general idea is to outline the distance between $h$-th view data points $x_ih$ and $h$-th view cluster centers $a_kh$ in a different manner of centroid-based approach.
arXiv Detail & Related papers (2024-05-09T08:35:21Z) - A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM) [37.87282737463472]
We propose VecKM, a local point cloud geometry encoder that is descriptive and efficient to compute.
VecKM constructs the local geometry encoding using all neighboring points, producing a more descriptive encoding.
VecKM is efficient to compute and scalable to large point cloud inputs.
arXiv Detail & Related papers (2024-04-02T02:01:21Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Weak Schur sampling with logarithmic quantum memory [0.0]
We introduce a new algorithm for the task of weak Schur sampling.
Our algorithm efficiently determines both the Young label which indexes the irreducible representations and the multiplicity label of the symmetric group.
arXiv Detail & Related papers (2023-09-21T10:02:46Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue.
We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues.
Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Consistent Structured Prediction with Max-Min Margin Markov Networks [84.60515484036239]
Max-margin methods for binary classification have been extended to the structured prediction setting under the name of max-margin Markov networks ($M3N$)
We overcome such limitations by defining the learning problem in terms of a "max-min" margin formulation, naming the resulting method max-min margin Markov networks ($M4N$)
Experiments on multi-class classification, ordinal regression, sequence prediction and ranking demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2020-07-02T10:48:42Z) - Improving The Performance Of The K-means Algorithm [2.28438857884398]
My thesis proposes two algorithms to speed up IKM while remaining the quality of its clustering result approximately.
The first algorithm, called Divisive K-means, improves the speed of IKM by speeding up its splitting process of clusters.
The second algorithm, called Parallel Two-Phase K-means (Par2PK-means), parallelizes IKM by employing the model of Two-Phase K-means.
arXiv Detail & Related papers (2020-05-10T15:09:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.