Related papers: CMOS-based Single-Cycle In-Memory XOR/XNOR

CMOS-based Single-Cycle In-Memory XOR/XNOR

URL: http://arxiv.org/abs/2310.18375v1
Date: Thu, 26 Oct 2023 21:43:01 GMT
Title: CMOS-based Single-Cycle In-Memory XOR/XNOR
Authors: Shamiul Alam, Jack Hutchins, Nikhil Shukla, Kazi Asifuzzaman, Ahmedullah Aziz,
Abstract summary: We propose a CMOS-based hardware topology for single-cycle in-memory XOR/XNOR operations. Our design provides at least 2 times improvement in the latency compared with other existing CMOS-compatible solutions. This all-CMOS design paves the way for practical implementation of CiM XOR/XNOR at scaled technology nodes.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Big data applications are on the rise, and so is the number of data centers. The ever-increasing massive data pool needs to be periodically backed up in a secure environment. Moreover, a massive amount of securely backed-up data is required for training binary convolutional neural networks for image classification. XOR and XNOR operations are essential for large-scale data copy verification, encryption, and classification algorithms. The disproportionate speed of existing compute and memory units makes the von Neumann architecture inefficient to perform these Boolean operations. Compute-in-memory (CiM) has proved to be an optimum approach for such bulk computations. The existing CiM-based XOR/XNOR techniques either require multiple cycles for computing or add to the complexity of the fabrication process. Here, we propose a CMOS-based hardware topology for single-cycle in-memory XOR/XNOR operations. Our design provides at least 2 times improvement in the latency compared with other existing CMOS-compatible solutions. We verify the proposed system through circuit/system-level simulations and evaluate its robustness using a 5000-point Monte Carlo variation analysis. This all-CMOS design paves the way for practical implementation of CiM XOR/XNOR at scaled technology nodes.

Related papers

Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation [129.45368843861917]
We introduce the Gated Memory Unit (GMU), a simple yet effective mechanism for efficient memory sharing across layers.<n>We apply it to create SambaY, a decoder-hybrid-decoder architecture that incorporates GMUs to share memory readout states from a Samba-based self-decoder.
arXiv Detail & Related papers (2025-07-09T07:27:00Z)
MINIMALIST: switched-capacitor circuits for efficient in-memory computation of gated recurrent units [0.4941855521192951]
Recurrent neural networks (RNNs) have been a long-standing candidate for processing of temporal sequence data.<n>Recent advances in training paradigms have now inspired new generations of efficient RNNs.<n>We introduce a streamlined and hardware-compatible architecture based on minimal gated recurrent units (GRUs)
arXiv Detail & Related papers (2025-05-13T14:13:41Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM [7.949839381468341]
Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (CPK) and zero-knowledge proofs (ZKP)
arXiv Detail & Related papers (2024-02-21T22:26:44Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices. We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z)
A Theory of I/O-Efficient Sparse Neural Network Inference [17.862408781750126]
Machine learning models increase their accuracy at a fast rate, so their demand for energy and compute resources increases. On a low level, the major part of these resources is consumed by data movement between different memory units. We provide a rigorous theoretical analysis of the I/Os needed in sparse feedforward neural network (FFNN) inference.
arXiv Detail & Related papers (2023-01-03T11:23:46Z)
HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation [26.67099154998755]
Multi-party computation (MPC) is a branch of cryptography where multiple non-colluding parties execute a protocol to securely compute a function. We study training and inference of neural networks under the MPC setup. We show that both of the approaches enjoy strong theoretical motivations and efficient computation under the MPC setup.
arXiv Detail & Related papers (2021-10-28T21:15:11Z)
Faster Secure Data Mining via Distributed Homomorphic Encryption [108.77460689459247]
Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field. We propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem. We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets.
arXiv Detail & Related papers (2020-06-17T18:14:30Z)
In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML Applications [0.0]
This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications. Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of array per precharge cycle. The proposed architecture was trained and tested on the IRIS dataset which exhibits $46times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.
arXiv Detail & Related papers (2020-05-19T15:36:39Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs. At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation. We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.