On Simplifying Large-Scale Spatial Vectors: Fast, Memory-Efficient, and Cost-Predictable k-means
- URL: http://arxiv.org/abs/2412.02244v1
- Date: Tue, 03 Dec 2024 08:16:59 GMT
- Title: On Simplifying Large-Scale Spatial Vectors: Fast, Memory-Efficient, and Cost-Predictable k-means
- Authors: Yushuai Ji, Zepeng Liu, Sheng Wang, Yuan Sun, Zhiyong Peng,
- Abstract summary: The k-means algorithm can simplify large-scale spatial vectors, such as 2D geo-locations and 3D point clouds, to support fast analytics and learning.
Existing k-means algorithms achieve high performance with significant computational resources, such as memory and CPU usage time.
We propose a fast, memory-efficient, and cost-predictable k-means called Dask-means.
- Score: 7.192072206801592
- License:
- Abstract: The k-means algorithm can simplify large-scale spatial vectors, such as 2D geo-locations and 3D point clouds, to support fast analytics and learning. However, when processing large-scale datasets, existing k-means algorithms have been developed to achieve high performance with significant computational resources, such as memory and CPU usage time. These algorithms, though effective, are not well-suited for resource-constrained devices. In this paper, we propose a fast, memory-efficient, and cost-predictable k-means called Dask-means. We first accelerate k-means by designing a memory-efficient accelerator, which utilizes an optimized nearest neighbor search over a memory-tunable index to assign spatial vectors to clusters in batches. We then design a lightweight cost estimator to predict the memory cost and runtime of the k-means task, allowing it to request appropriate memory from devices or adjust the accelerator's required space to meet memory constraints, and ensure sufficient CPU time for running k-means. Experiments show that when simplifying datasets with scale such as $10^6$, Dask-means uses less than $30$MB of memory, achieves over $168$ times speedup compared to the widely-used Lloyd's algorithm. We also validate Dask-means on mobile devices, where it demonstrates significant speedup and low memory cost compared to other state-of-the-art (SOTA) k-means algorithms. Our cost estimator estimates the memory cost with a difference of less than $3\%$ from the actual ones and predicts runtime with an MSE up to $33.3\%$ lower than SOTA methods.
Related papers
- On Storage Neural Network Augmented Approximate Nearest Neighbor Search [1.3654846342364308]
It is necessary to search for the most similar vectors to a given query vector from the data stored in storage devices, not from that in memory.
In this paper, we propose a method to predict the correct clusters by means of a neural network.
Compared to state-of-the-art SPANN and an exhaustive method using k-means clustering and linear search, the proposed method achieves 90% recall on SIFT1M with 80% and 58% less data fetched from storage, respectively.
arXiv Detail & Related papers (2025-01-23T06:56:18Z) - RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval [24.472784635757016]
RetrievalAttention is a training-free approach to both accelerate attention computation and reduce GPU memory consumption.
We show that RetrievalAttention achieves near full attention accuracy while only requiring access to 1--3% of the data.
arXiv Detail & Related papers (2024-09-16T17:59:52Z) - Loki: Low-rank Keys for Efficient Sparse Attention [44.74682508879725]
Inference on large language models (LLMs) can be expensive in terms of the compute and memory costs involved.
In this work, we propose to approximate self-attention by focusing on the dimensionality of key vectors computed in the attention block.
We propose Loki, a novel sparse attention method that ranks and selects tokens in the KV-cache based on attention scores computed in low-dimensional space.
arXiv Detail & Related papers (2024-06-04T17:58:03Z) - A Specialized Semismooth Newton Method for Kernel-Based Optimal
Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples.
We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue.
We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues.
Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Exact Acceleration of K-Means++ and K-Means$\|$ [22.66983713481359]
K-Means++ and K-Means$|$ have become de facto tools for selecting the initial seeds of K-means.
We develop specialized triangle inequality pruning strategies and a dynamic priority queue to show the first acceleration of K-Means++ and K-Means$|$.
arXiv Detail & Related papers (2021-05-06T20:22:55Z) - An Efficient $k$-modes Algorithm for Clustering Categorical Datasets [8.528384027684194]
We provide a novel, computationally efficient implementation of $k$-modes, called OTQT.
We prove that OTQT finds updates to improve the objective function that are undetectable to existing $k$-modes algorithms.
OTQT is always more accurate per iteration and almost always faster (and only barely slower on some datasets) to the final optimum.
arXiv Detail & Related papers (2020-06-06T18:41:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.