XBTorch: A Unified Framework for Modeling and Co-Design of Crossbar-Based Deep Learning Accelerators
- URL: http://arxiv.org/abs/2601.07086v1
- Date: Sun, 11 Jan 2026 22:35:30 GMT
- Title: XBTorch: A Unified Framework for Modeling and Co-Design of Crossbar-Based Deep Learning Accelerators
- Authors: Osama Yousuf, Andreu L. Glasmann, Martin Lueker-Boden, Sina Najmaei, Gina C. Adam,
- Abstract summary: This paper introduces XBTorch, a novel simulation framework that integrates seamlessly with PyTorch.<n>XBTorch provides specialized tools for accurately and efficiently modeling crossbar-based systems based on emerging memory technologies.
- Score: 0.5834731599084116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emerging memory technologies have gained significant attention as a promising pathway to overcome the limitations of conventional computing architectures in deep learning applications. By enabling computation directly within memory, these technologies - built on nanoscale devices with tunable and nonvolatile conductance - offer the potential to drastically reduce energy consumption and latency compared to traditional von Neumann systems. This paper introduces XBTorch (short for CrossBarTorch), a novel simulation framework that integrates seamlessly with PyTorch and provides specialized tools for accurately and efficiently modeling crossbar-based systems based on emerging memory technologies. Through detailed comparisons and case studies involving hardware-aware training and inference, we demonstrate how XBTorch offers a unified interface for key research areas such as device-level modeling, cross-layer co-design, and inference-time fault tolerance. While exemplar studies utilize ferroelectric field-effect transistor (FeFET) models, the framework remains technology-agnostic - supporting other emerging memories such as resistive RAM (ReRAM), as well as enabling user-defined custom device models. The code is publicly available at: https://github.com/ADAM-Lab-GW/xbtorch
Related papers
- RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis [53.90240071275054]
The transition toward localized intelligence through Small Language Models (SLMs) has intensified the need for rigorous performance characterization on resource-constrained edge hardware.<n>We propose a systematic framework that unifies architectural primitives and hardware constraints through the lens of operational intensity (OI)<n>By defining an inference-potential region, we introduce the Relative Inference Potential as a novel metric to compare efficiency differences between Large Language Models (LLMs) on the same hardware substrate.
arXiv Detail & Related papers (2026-02-12T03:02:22Z) - Boosted Trees on a Diet: Compact Models for Resource-Constrained Devices [1.2483467287071346]
We present a compression scheme for boosted decision trees, addressing the growing need for lightweight machine learning models.<n>We show that models achieved the same performance with a compression ratio of 4-16x compared to LightGBM models.<n>This capability opens the door to a wide range of IoT applications, including remote monitoring, edge analytics, and real-time decision making in isolated or power-limited environments.
arXiv Detail & Related papers (2025-10-30T14:47:57Z) - In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning [59.091567092071564]
In-memory training typically requires at least 8-bit conductance states to match digital baselines.<n>Many promising memristive devices such as ReRAM offer only about 4-bit resolution due to fabrication constraints.<n>This paper proposes a emphresidual learning framework that sequentially learns on multiple crossbar tiles to compensate the residual errors.
arXiv Detail & Related papers (2025-10-02T19:44:25Z) - Bruno: Backpropagation Running Undersampled for Novel device Optimization [37.69303106863453]
We present a bottom-up approach to train neural networks for hardware based on spiking neurons and synapses built on ferroelectric non-volatile devices (RRAM)<n>The training algorithm is then tested on a dataset with a network composed of quantized synapses based on RRAM and ferroelectric integrate-and-fire neurons.
arXiv Detail & Related papers (2025-05-23T12:06:43Z) - On Accelerating Edge AI: Optimizing Resource-Constrained Environments [1.7355861031903428]
Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations.<n>We present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints.
arXiv Detail & Related papers (2025-01-25T01:37:03Z) - NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - AnalogNAS: A Neural Network Design Framework for Accurate Inference with
Analog In-Memory Computing [7.596833322764203]
Inference at the edge requires low latency, compact and power-efficient models.
analog/mixed signal in-memory computing hardware accelerators can easily transcend the memory wall of von Neuman architectures.
We propose AnalogNAS, a framework for automated Deep Neural Network (DNN) design targeting deployment on analog In-Memory Computing (IMC) inference accelerators.
arXiv Detail & Related papers (2023-05-17T07:39:14Z) - In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML
Applications [0.0]
This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications.
Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of array per precharge cycle.
The proposed architecture was trained and tested on the IRIS dataset which exhibits $46times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.
arXiv Detail & Related papers (2020-05-19T15:36:39Z) - Neural Network Compression Framework for fast model inference [59.65531492759006]
We present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF)
It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization.
The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code.
arXiv Detail & Related papers (2020-02-20T11:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.