Primitive-Driven Acceleration of Hyperdimensional Computing for Real-Time Image Classification
- URL: http://arxiv.org/abs/2601.20061v1
- Date: Tue, 27 Jan 2026 21:12:56 GMT
- Title: Primitive-Driven Acceleration of Hyperdimensional Computing for Real-Time Image Classification
- Authors: Dhruv Parikh, Jebacyril Arockiaraj, Viktor Prasanna,
- Abstract summary: We develop an image-encoding algorithm that maps local image patches to hypervectors enriched with spatial information.<n>These patch-level hypervectors are then merged into a global representation using the fundamental HDC operations.<n>This encoder achieves 95.67% accuracy on MNIST and 85.14% on Fashion-MNIST, outperforming prior HDC-based image encoders.
- Score: 0.07646713951724012
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hyperdimensional Computing (HDC) represents data using extremely high-dimensional, low-precision vectors, termed hypervectors (HVs), and performs learning and inference through lightweight, noise-tolerant operations. However, the high dimensionality, sparsity, and repeated data movement involved in HDC make these computations difficult to accelerate efficiently on conventional processors. As a result, executing core HDC operations: binding, permutation, bundling, and similarity search: on CPUs or GPUs often leads to suboptimal utilization, memory bottlenecks, and limits on real-time performance. In this paper, our contributions are two-fold. First, we develop an image-encoding algorithm that, similar in spirit to convolutional neural networks, maps local image patches to hypervectors enriched with spatial information. These patch-level hypervectors are then merged into a global representation using the fundamental HDC operations, enabling spatially sensitive and robust image encoding. This encoder achieves 95.67% accuracy on MNIST and 85.14% on Fashion-MNIST, outperforming prior HDC-based image encoders. Second, we design an end-to-end accelerator that implements these compute operations on an FPGA through a pipelined architecture that exploits parallelism both across the hypervector dimensionality and across the set of image patches. Our Alveo U280 implementation delivers 0.09ms inference latency, achieving up to 1300x and 60x speedup over state-of-the-art CPU and GPU baselines, respectively.
Related papers
- ProGIC: Progressive and Lightweight Generative Image Compression with Residual Vector Quantization [59.481950697968706]
We propose Progressive Generative Image Compression (ProGIC), a compact built on residual vector quantization (RVQ)<n>In RVQ, a sequence of vector quantizers encodes the residuals stage by stage, each with its own codebook.<n>We pair this with a lightweight backbone based on depthwise-separable convolutions and small attention blocks, enabling practical deployment on both GPU and CPU-only devices.
arXiv Detail & Related papers (2026-03-03T11:47:05Z) - ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs [0.0]
Hyperdimensional Computing (HDC) represents and manipulates information using high-dimensional vectors, called hypervectors (HV)<n>Traditional HDC methods rely on single-pass, non-parametric training and often suffer from low accuracy.<n>Inference, however, remains lightweight and well-suited for real-time execution.
arXiv Detail & Related papers (2025-06-10T22:46:12Z) - Speedy MASt3R [68.47052557089631]
MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme.<n>Fast MASt3R achieves a 54% reduction in inference time (198 ms to 91 ms per image pair) without sacrificing accuracy.<n>This advancement enables real-time 3D understanding, benefiting applications like mixed reality navigation and large-scale 3D scene reconstruction.
arXiv Detail & Related papers (2025-03-13T03:56:22Z) - uHD: Unary Processing for Lightweight and Dynamic Hyperdimensional
Computing [1.7118124088316602]
Hyperdimensional computing (HDC) is a novel computational paradigm that operates on long-dimensional vectors known as hypervectors.
In this paper, we show how to generate intensity and position hypervectors in HDC using low-discrepancy sequences.
For the first time in the literature, our proposed approach employs lightweight vector generators utilizing unary bit-streams for efficient encoding of data.
arXiv Detail & Related papers (2023-11-16T06:28:19Z) - INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order
Gradient Computations in Implicit Neural Representation Processing [66.00729477511219]
Given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient.
We introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
We present results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively.
arXiv Detail & Related papers (2023-08-11T04:24:39Z) - Efficient stereo matching on embedded GPUs with zero-means cross
correlation [8.446808526407738]
We propose a novel acceleration approach for the zero-means normalized cross correlation (ZNCC) matching cost calculation algorithm on a Jetson Tx2 embedded GPU.
In our method for accelerating ZNCC, target images are scanned in a zigzag fashion to efficiently reuse one pixel's computation for its neighboring pixels.
Our system show real-time processing speed of 32 fps, on a Jetson Tx2 GPU for 1,280x384 pixel images with a maximum disparity of 128.
arXiv Detail & Related papers (2022-12-01T13:03:38Z) - Fast and High-Quality Image Denoising via Malleable Convolutions [72.18723834537494]
We present Malleable Convolution (MalleConv), as an efficient variant of dynamic convolution.
Unlike previous works, MalleConv generates a much smaller set of spatially-varying kernels from input.
We also build an efficient denoising network using MalleConv, coined as MalleNet.
arXiv Detail & Related papers (2022-01-02T18:35:20Z) - Parallel Discrete Convolutions on Adaptive Particle Representations of
Images [2.362412515574206]
We present data structures and algorithms for native implementations of discrete convolution operators over Adaptive Particle Representations.
The APR is a content-adaptive image representation that locally adapts the sampling resolution to the image signal.
We show that APR convolution naturally leads to scale-adaptive algorithms that efficiently parallelize on multi-core CPU and GPU architectures.
arXiv Detail & Related papers (2021-12-07T09:40:05Z) - Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters.
It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions.
We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z) - SHEARer: Highly-Efficient Hyperdimensional Computing by
Software-Hardware Enabled Multifold Approximation [7.528764144503429]
We propose SHEARer, an algorithm-hardware co-optimization to improve the performance and energy consumption of HD computing.
SHEARer achieves an average throughput boost of 104,904x (15.7x) and energy savings of up to 56,044x (301x) compared to state-of-the-art encoding methods.
We also develop a software framework that enables training HD models by emulating the proposed approximate encodings.
arXiv Detail & Related papers (2020-07-20T07:58:44Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z) - Efficient Video Semantic Segmentation with Labels Propagation and
Refinement [138.55845680523908]
This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach.
We propose an Efficient Video(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next.
On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
arXiv Detail & Related papers (2019-12-26T11:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.