Related papers: MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures

MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures

URL: http://arxiv.org/abs/2502.07834v1
Date: Tue, 11 Feb 2025 00:53:15 GMT
Title: MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures
Authors: Do Yeong Kang, Yeong Hwan Oh, Chanwook Hwang, Jinhee Kim, Kang Eun Jeon, Jong Hwan Ko,
Abstract summary: MEMHD is a Memory-Efficient Multi-centroid HDC framework designed to address these challenges.<n>Our approach achieves full utilization of IMC arrays and enables one-shot (or few-shot) associative search.<n>MEMHD reduces computation cycles by up to 80x and array usage by up to 71x compared to baseline IMC mapping methods.
Score: 7.990774970571298
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The implementation of Hyperdimensional Computing (HDC) on In-Memory Computing (IMC) architectures faces significant challenges due to the mismatch between highdimensional vectors and IMC array sizes, leading to inefficient memory utilization and increased computation cycles. This paper presents MEMHD, a Memory-Efficient Multi-centroid HDC framework designed to address these challenges. MEMHD introduces a clustering-based initialization method and quantization aware iterative learning for multi-centroid associative memory. Through these approaches and its overall architecture, MEMHD achieves a significant reduction in memory requirements while maintaining or improving classification accuracy. Our approach achieves full utilization of IMC arrays and enables one-shot (or few-shot) associative search. Experimental results demonstrate that MEMHD outperforms state-of-the-art binary HDC models, achieving up to 13.69% higher accuracy with the same memory usage, or 13.25x more memory efficiency at the same accuracy level. Moreover, MEMHD reduces computation cycles by up to 80x and array usage by up to 71x compared to baseline IMC mapping methods when mapped to 128x128 IMC arrays, while significantly improving energy and computation cycle efficiency.

Related papers

MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models [72.61076288351201]
We propose Memory-efficient Offloaded Mini-sequence Inference (MOM) MOM partitions critical layers into smaller "mini-sequences" and integrates seamlessly with KV cache offloading. On Meta-Llama-3.2-8B, MOM extends the maximum context length from 155k to 455k tokens on a single A100 80GB GPU.
arXiv Detail & Related papers (2025-04-16T23:15:09Z)
LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention. For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z)
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [59.835032408496545]
We propose a tile-based strategy that partitions the contrastive loss calculation into arbitrary small blocks. We also introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems. Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed.
arXiv Detail & Related papers (2024-10-22T17:59:30Z)
Approximate ADCs for In-Memory Computing [5.1793930906065775]
In memory computing (IMC) architectures for deep learning (DL) accelerators leverage energy-efficient and highly parallel matrix vector multiplication (MVM) operations. Recently reported designs reveal that the ADCs required for reading out the MVM results, consume more than 85% of the total compute power and also dominate the area. In this work we present peripheral aware design of IMC cores, to mitigate such overheads.
arXiv Detail & Related papers (2024-08-11T05:59:59Z)
vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving [53.972175896814505]
Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests.
arXiv Detail & Related papers (2024-07-22T14:37:58Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory Architectures [0.1747623282473278]
We present CLSA-CIM, a cross-layer scheduling algorithm for tiled CIM architectures. We integrate CLSA-CIM with existing weight-mapping strategies and compare performance against state-of-the-art (SOTA) scheduling algorithms.
arXiv Detail & Related papers (2024-01-15T13:35:21Z)
WWW: What, When, Where to Compute-in-Memory [6.944507442667191]
Compute-in-memory (CiM) has emerged as a highly energy efficient solution for performing matrix multiplication during Machine Learning (ML) inference. Here, we investigate what type of CiM to use, when and where to optimally integrate it in the cache hierarchy for efficient matrix multiplication. Our experiments show that CiM integrated memory improves energy efficiency by up to 3.4x and throughput by up to 15.6x compared to tensor-core-like baseline architecture.
arXiv Detail & Related papers (2023-12-26T06:16:12Z)
HDCC: A Hyperdimensional Computing compiler for classification on embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code. name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend. To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z)
Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue. We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues. Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z)
Brain-inspired Cognition in Next Generation Racetrack Memories [0.6850683267295249]
Hyperdimensional computing (HDC) is an emerging computational framework inspired by the brain that operates on vectors with thousands of dimensions to emulate cognition. This paper presents an architecture based on racetrack memory (RTM) to conduct and accelerate the entire HDC framework within the memory. The proposed solution requires minimal additional CMOS circuitry and uses a read operation across multiple domains in RTMs called transverse read (TR) to realize exclusive-or (XOR) and addition operations.
arXiv Detail & Related papers (2021-11-03T14:21:39Z)
Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System [3.202860612193139]
We propose an artificially intelligent memory mapping scheme, AIMM, that optimize data placement and resource utilization through page and computation remapping. AIMM uses a neural network to achieve a near-optimal mapping during execution, trained using a reinforcement learning algorithm. Our experimental evaluation shows that AIMM improves the baseline NMP performance in single and multiple program scenario by up to 70% and 50%, respectively.
arXiv Detail & Related papers (2021-04-28T09:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.