Related papers: ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs

ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs

URL: http://arxiv.org/abs/2506.09282v1
Date: Tue, 10 Jun 2025 22:46:12 GMT
Title: ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs
Authors: Dhruv Parikh, Viktor Prasanna,
Abstract summary: Hyperdimensional Computing (HDC) represents and manipulates information using high-dimensional vectors, called hypervectors (HV)<n>Traditional HDC methods rely on single-pass, non-parametric training and often suffer from low accuracy.<n>Inference, however, remains lightweight and well-suited for real-time execution.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that represents and manipulates information using high-dimensional vectors, called hypervectors (HV). Traditional HDC methods, while robust to noise and inherently parallel, rely on single-pass, non-parametric training and often suffer from low accuracy. To address this, recent approaches adopt iterative training of base and class HVs, typically accelerated on GPUs. Inference, however, remains lightweight and well-suited for real-time execution. Yet, efficient HDC inference has been studied almost exclusively on specialized hardware such as FPGAs and GPUs, with limited attention to general-purpose multi-core CPUs. To address this gap, we propose ScalableHD for scalable and high-throughput HDC inference on multi-core CPUs. ScalableHD employs a two-stage pipelined execution model, where each stage is parallelized across cores and processes chunks of base and class HVs. Intermediate results are streamed between stages using a producer-consumer mechanism, enabling on-the-fly consumption and improving cache locality. To maximize performance, ScalableHD integrates memory tiling and NUMA-aware worker-to-core binding. Further, it features two execution variants tailored for small and large batch sizes, each designed to exploit compute parallelism based on workload characteristics while mitigating the memory-bound compute pattern that limits HDC inference performance on modern multi-core CPUs. ScalableHD achieves up to 10x speedup in throughput (samples per second) over state-of-the-art baselines such as TorchHD, across a diverse set of tasks ranging from human activity recognition to image classification, while preserving task accuracy. Furthermore, ScalableHD exhibits robust scalability: increasing the number of cores yields near-proportional throughput improvements.

Related papers

HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference [8.826966369389893]
We present HGCA, a hybrid CPU- GPU attention mechanism for large language models.<n>We show that HGCA achieves superior scalability, supports longer sequences and larger batch sizes, and outperforms existing sparse attention baselines in both performance and accuracy.<n> Experiments across diverse models and workloads show that HGCA achieves superior scalability, supports longer sequences and larger batch sizes, and outperforms existing sparse attention baselines in both performance and accuracy.
arXiv Detail & Related papers (2025-07-03T20:20:33Z)
DPQ-HD: Post-Training Compression for Ultra-Low Power Hyperdimensional Computing [6.378578005171813]
We propose a novel Post Training Compression algorithm, Decomposition-Pruning-Quantization (DPQ-HD)<n>DPQ-HD reduces computational and memory overhead by uniquely combining the above three compression techniques.<n>We demonstrate that DPQ-HD achieves up to 20-100x reduction in memory for image and graph classification tasks with only a 1-2% drop in accuracy.
arXiv Detail & Related papers (2025-05-08T16:54:48Z)
DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training [85.04885553561164]
Diffusion Transformers (DiTs) have shown remarkable performance in generating high-quality videos.<n>DiTs can consume up to 95% of processing time and demand specialized context parallelism.<n>This paper introduces DSV to accelerate video DiT training by leveraging the dynamic attention sparsity we empirically observe.
arXiv Detail & Related papers (2025-02-11T14:39:59Z)
HRVMamba: High-Resolution Visual State Space Model for Dense Prediction [60.80423207808076]
State Space Models (SSMs) with efficient hardware-aware designs have demonstrated significant potential in computer vision tasks. These models have been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation. We introduce the Dynamic Visual State Space (DVSS) block, which employs deformable convolution to mitigate the long-range forgetting problem. We also introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process.
arXiv Detail & Related papers (2024-10-04T06:19:29Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code. name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend. To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z)
HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration [4.783565770657063]
We introduce HDTorch, an open-source, PyTorch-based HDC library with extensions for hypervector operations. We analyze four HDC benchmark datasets in terms of accuracy, runtime, and memory consumption. We perform the first-ever HD training and inference analysis of the entirety of the CHB-MIT EEG epilepsy database.
arXiv Detail & Related papers (2022-06-09T19:46:08Z)
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent. We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z)
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z)
SHEARer: Highly-Efficient Hyperdimensional Computing by Software-Hardware Enabled Multifold Approximation [7.528764144503429]
We propose SHEARer, an algorithm-hardware co-optimization to improve the performance and energy consumption of HD computing. SHEARer achieves an average throughput boost of 104,904x (15.7x) and energy savings of up to 56,044x (301x) compared to state-of-the-art encoding methods. We also develop a software framework that enables training HD models by emulating the proposed approximate encodings.
arXiv Detail & Related papers (2020-07-20T07:58:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.