Related papers: A 5 \mu W Standard Cell Memory-based Configurable Hyperdimensional Computing Accelerator for Always-on Smart Sensing

A 5 \mu W Standard Cell Memory-based Configurable Hyperdimensional Computing Accelerator for Always-on Smart Sensing

URL: http://arxiv.org/abs/2102.02758v1
Date: Thu, 4 Feb 2021 17:41:29 GMT
Title: A 5 \mu W Standard Cell Memory-based Configurable Hyperdimensional Computing Accelerator for Always-on Smart Sensing
Authors: Manuel Eggimann, Abbas Rahimi, Luca Benini
Abstract summary: Hyperdimensional computing (HDC) is a brain-inspired computing paradigm based on high-dimensional holistic representations of vectors. We propose a programmable all-digital CMOS implementation of a fully autonomous HDC accelerator for always-on classification in energy-constrained sensor nodes.
Score: 16.589169601764297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hyperdimensional computing (HDC) is a brain-inspired computing paradigm based on high-dimensional holistic representations of vectors. It recently gained attention for embedded smart sensing due to its inherent error-resiliency and suitability to highly parallel hardware implementations. In this work, we propose a programmable all-digital CMOS implementation of a fully autonomous HDC accelerator for always-on classification in energy-constrained sensor nodes. By using energy-efficient standard cell memory (SCM), the design is easily cross-technology mappable. It achieves extremely low power, 5 $\mu W$ in typical applications, and an energy-efficiency improvement over the state-of-the-art (SoA) digital architectures of up to 3$\times$ in post-layout simulations for always-on wearable tasks such as EMG gesture recognition. As part of the accelerator's architecture, we introduce novel hardware-friendly embodiments of common HDC-algorithmic primitives, which results in 3.3$\times$ technology scaled area reduction over the SoA, achieving the same accuracy levels in all examined targets. The proposed architecture also has a fully configurable datapath using microcode optimized for HDC stored on an integrated SCM based configuration memory, making the design "general-purpose" in terms of HDC algorithm flexibility. This flexibility allows usage of the accelerator across novel HDC tasks, for instance, a newly designed HDC applied to the task of ball bearing fault detection.

Related papers

Toward Large-Scale Photonics-Empowered AI Systems: From Physical Design Automation to System-Algorithm Co-Exploration [5.036634263468385]
SimPhony provides implementation-aware modeling and rapid cross-layer evaluation.<n>ADEPT and ADEPT-Z enable end-to-end circuit and topology exploration.<n>Apollo and LiDAR provide scalable photonic physical design automation.
arXiv Detail & Related papers (2025-12-31T22:21:42Z)
Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization [14.87046071090259]
3D Gaussian Splatting (3DGS) has recently gained significant attention for high-quality and efficient view synthesis.<n>Despite its impressive algorithmic performance, real-time rendering on resource-constrained devices remains a major challenge due to tight power and area budgets.
arXiv Detail & Related papers (2025-06-08T10:14:54Z)
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [55.75103034526652]
We propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs. Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost. We design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability.
arXiv Detail & Related papers (2025-03-20T21:03:10Z)
HD-CB: The First Exploration of Hyperdimensional Computing for Contextual Bandits Problems [0.6377289349842638]
This work introduces the Hyperdimensional Contextual Bandits (HD-CB) HD-CB is the first exploration of HDC to model and automate sequential decision-making problems. It consistently achieves competitive or superior performance compared to traditional linear CB algorithms.
arXiv Detail & Related papers (2025-01-28T11:28:09Z)
Co-design of a novel CMOS highly parallel, low-power, multi-chip neural network accelerator [0.0]
We present the NV-1, a new low-power ASIC AI processor that greatly accelerates parallel processing (> 10X) with dramatic reduction in energy consumption. The resulting device is currently being used in a fielded edge sensor application.
arXiv Detail & Related papers (2024-09-28T15:47:16Z)
PowerYOLO: Mixed Precision Model for Hardware Efficient Object Detection with Event Data [0.5461938536945721]
PowerYOLO is a mixed precision solution to the problem of fitting algorithms of high memory and computational complexity into small low-power devices. First, we propose a system based on a Dynamic Vision Sensor (DVS), a novel sensor, that offers low power requirements. Second, to ensure high accuracy and low memory and computational complexity, we propose to use 4-bit width Powers-of-Two (PoT) quantisation. Third, we replace multiplication with bit-shifting to increase the efficiency of hardware acceleration of such solution.
arXiv Detail & Related papers (2024-07-11T08:17:35Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
HDReason: Algorithm-Hardware Codesign for Hyperdimensional Knowledge Graph Reasoning [18.790512589967875]
Brain-inspired HyperDimensional Computing (HDC) has been introduced as a promising solution for lightweight machine learning. In this paper, we leverage HDC for an intrinsically more efficient and acceleration-friendly Knowledge Graph Completion (KGC) algorithm. We also co-design an acceleration framework named HDReason targeting FPGA platforms.
arXiv Detail & Related papers (2024-03-09T02:17:43Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
A Novel Implementation Methodology for Error Correction Codes on a Neuromorphic Architecture [0.8021197489470758]
We propose a methodology to map the hard-decision class of decoder algorithms on a neuromorphic architecture. We present the implementation of the Gallager B decoding algorithm on a TrueNorth-inspired architecture that is emulated on the Xilinx Zynq ZCU102 MPSoC.
arXiv Detail & Related papers (2023-06-06T20:49:10Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Brain-inspired Cognition in Next Generation Racetrack Memories [0.6850683267295249]
Hyperdimensional computing (HDC) is an emerging computational framework inspired by the brain that operates on vectors with thousands of dimensions to emulate cognition. This paper presents an architecture based on racetrack memory (RTM) to conduct and accelerate the entire HDC framework within the memory. The proposed solution requires minimal additional CMOS circuitry and uses a read operation across multiple domains in RTMs called transverse read (TR) to realize exclusive-or (XOR) and addition operations.
arXiv Detail & Related papers (2021-11-03T14:21:39Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.