GPU-accelerated simulated annealing based on p-bits with real-world device-variability modeling
- URL: http://arxiv.org/abs/2601.14476v1
- Date: Tue, 20 Jan 2026 20:59:21 GMT
- Title: GPU-accelerated simulated annealing based on p-bits with real-world device-variability modeling
- Authors: Naoya Onizawa, Takahiro Hanyu,
- Abstract summary: Probabilistic computing using p-bits (p-bits) is an efficient alternative to CMOS logic for complex problem-solving.<n>This paper introduces a GPU-accelerated, open-source simulated framework based on p-bits.<n>By providing a scalable and accessible tool, this framework aims to advance research in probabilistic computing.
- Score: 1.2375561840897742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Probabilistic computing using probabilistic bits (p-bits) presents an efficient alternative to traditional CMOS logic for complex problem-solving, including simulated annealing and machine learning. Realizing p-bits with emerging devices such as magnetic tunnel junctions (MTJs) introduces device variability, which was expected to negatively impact computational performance. However, this study reveals an unexpected finding: device variability can not only degrade but also enhance algorithm performance, particularly by leveraging timing variability. This paper introduces a GPU-accelerated, open-source simulated annealing framework based on p-bits that models key device variability factors -timing, intensity, and offset- to reflect real-world device behavior. Through CUDA-based simulations, our approach achieves a two-order magnitude speedup over CPU implementations on the MAX-CUT benchmark with problem sizes ranging from 800 to 20,000 nodes. By providing a scalable and accessible tool, this framework aims to advance research in probabilistic computing, enabling optimization applications in diverse fields.
Related papers
- Comparing performance of variational quantum algorithm simulations on HPC systems [0.545520830707066]
Variational quantum algorithms are of special importance because of their applicability to current Noisy Intermediate-Scale Quantum (NISQ) devices.<n>Main building blocks of these algorithms (among them, the definition of the Hamiltonian and of the ansatz) define a relatively large parameter space.<n>We employ a generic description of the problem, in terms of both Hamiltonian and ansatz, to port a problem definition consistently among different simulators.
arXiv Detail & Related papers (2025-07-23T15:46:54Z) - Scaling Probabilistic Circuits via Monarch Matrices [109.65822339230853]
Probabilistic Circuits (PCs) are tractable representations of probability distributions.<n>We propose a novel sparse and structured parameterization for the sum blocks in PCs.
arXiv Detail & Related papers (2025-06-14T07:39:15Z) - Benchmarking Energy and Latency in TinyML: A Novel Method for Resource-Constrained AI [0.0]
This work introduces an alternative benchmarking methodology that integrates energy and latency measurements.<n>To evaluate our setup, we tested the STM32N6 MCU, which includes a NPU for executing neural networks.<n>Our findings demonstrate that reducing the core voltage and clock frequency improve the efficiency of pre- and post-processing.
arXiv Detail & Related papers (2025-05-21T15:12:14Z) - A CMOS Probabilistic Computing Chip With In-situ hardware Aware Learning [0.0]
This paper demonstrates a probabilistic bit physics inspired solver with 440 spins configured in a Chimera graph, occupying an area of 0.44 mm2.<n>We validate the chip's ability to perform probabilistic computing tasks such as modeling logic gates and full adders, as well as optimization tasks such as MaxCut.
arXiv Detail & Related papers (2025-04-18T20:40:48Z) - A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures [73.65190161312555]
ARCANA is a software spiking neural network simulator designed to account for the properties of mixed-signal neuromorphic circuits.<n>We show how the results obtained provide a reliable estimate of the behavior of the spiking neural network trained in software, once deployed in hardware.
arXiv Detail & Related papers (2024-09-23T11:16:46Z) - Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs.
At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads.
At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - Ps and Qs: Quantization-aware pruning for efficient low latency neural
network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications.
We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z) - Modular Simulation Framework for Process Variation Analysis of
MRAM-based Deep Belief Networks [2.0222827433041535]
Magnetic Random-Access Memory (MRAM) based p-bit neuromorphic computing devices are garnering increasing interest as a means to compactly and efficiently realize machine learning operations in machines Boltzmann Machines (RBMs)
Restrictedity of activation is dependent on the energy barrier of the MRAM device, and it is essential to assess the impact of process variation on the voltage-dependent behavior of the sigmoid function.
Here, transportable Python scripts are developed to analyze the output variation under changes in device dimensions on the accuracy of machine learning applications.
arXiv Detail & Related papers (2020-02-03T17:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.