Physics-inspired Ising Computing with Ring Oscillator Activated p-bits
- URL: http://arxiv.org/abs/2205.07402v1
- Date: Sun, 15 May 2022 23:46:58 GMT
- Title: Physics-inspired Ising Computing with Ring Oscillator Activated p-bits
- Authors: Navid Anjum Aadit, Andrea Grimaldi, Giovanni Finocchio, and Kerem Y.
Camsari
- Abstract summary: We design and implement a truly asynchronous and medium-scale p-computer with $$ 800 p-bits.
We evaluate the performance of the asynchronous architecture against an ideal, synchronous design.
Our results highlight the promise of massively scaled p-computers with millions of free-running p-bits.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The nearing end of Moore's Law has been driving the development of
domain-specific hardware tailored to solve a special set of problems. Along
these lines, probabilistic computing with inherently stochastic building blocks
(p-bits) have shown significant promise, particularly in the context of hard
optimization and statistical sampling problems. p-bits have been proposed and
demonstrated in different hardware substrates ranging from small-scale
stochastic magnetic tunnel junctions (sMTJs) in asynchronous architectures to
large-scale CMOS in synchronous architectures. Here, we design and implement a
truly asynchronous and medium-scale p-computer (with $\approx$ 800 p-bits) that
closely emulates the asynchronous dynamics of sMTJs in Field Programmable Gate
Arrays (FPGAs). Using hard instances of the planted Ising glass problem on the
Chimera lattice, we evaluate the performance of the asynchronous architecture
against an ideal, synchronous design that performs parallelized (chromatic)
exact Gibbs sampling. We find that despite the lack of any careful
synchronization, the asynchronous design achieves parallelism with comparable
algorithmic scaling in the ideal, carefully tuned and parallelized synchronous
design. Our results highlight the promise of massively scaled p-computers with
millions of free-running p-bits made out of nanoscale building blocks such as
stochastic magnetic tunnel junctions.
Related papers
- MindFlayer: Efficient Asynchronous Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times [49.1574468325115]
We study the problem of minimizing the expectation of smooth non functions with the help of several parallel workers.
We propose a new asynchronous SGD method, Mindlayer SGD, in which the noise is heavy tailed.
Our theory empirical results demonstrate the superiority of Mindlayer SGD in cases when the noise is heavy tailed.
arXiv Detail & Related papers (2024-10-05T21:11:32Z) - Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times.
We introduce a new parallel method Freya to handle arbitrarily slow computations.
We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z) - Scalable Parity Architecture With a Shuttling-Based Spin Qubit Processor [0.32985979395737786]
We present sequences of spin shuttling and quantum gates that implement the Parity Quantum Approximate Optimization Algorithm (QAOA)
We develop a detailed error model for a hardware-specific analysis of the Parity Architecture.
We find that with high-fidelity spin shuttling the performance of the spin qubits is competitive or even exceeds the results of the transmons.
arXiv Detail & Related papers (2024-03-14T17:06:50Z) - Parallelized Spatiotemporal Binding [47.67393266882402]
We introduce Parallelizable Spatiotemporal Binder or PSB, the first temporally-parallelizable slot learning architecture for sequential inputs.
Unlike conventional RNN-based approaches, PSB produces object-centric representations, known as slots, for all time-steps in parallel.
Compared to the state-of-the-art, our architecture demonstrates stable training on longer sequences, achieves parallelization that results in a 60% increase in training speed, and yields performance that is on par with or better on unsupervised 2D and 3D object-centric scene decomposition and understanding.
arXiv Detail & Related papers (2024-02-26T23:16:34Z) - Robust Fully-Asynchronous Methods for Distributed Training over General Architecture [11.480605289411807]
Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers.
We propose Fully-Asynchronous Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own without any form of impact.
arXiv Detail & Related papers (2023-07-21T14:36:40Z) - Biologically Plausible Learning on Neuromorphic Hardware Architectures [27.138481022472]
Neuromorphic computing is an emerging paradigm that confronts this imbalance by computations directly in analog memories.
This work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa.
arXiv Detail & Related papers (2022-12-29T15:10:59Z) - Model-Architecture Co-Design for High Performance Temporal GNN Inference
on FPGA [5.575293536755127]
Real-world applications require high performance inference on real-time streaming dynamic graphs.
We present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs.
We train our simplified models using knowledge distillation to ensure similar accuracy vis-'a-vis the original model.
arXiv Detail & Related papers (2022-03-10T00:24:47Z) - Scaling Quantum Approximate Optimization on Near-term Hardware [49.94954584453379]
We quantify scaling of the expected resource requirements by optimized circuits for hardware architectures with varying levels of connectivity.
We show the number of measurements, and hence total time to synthesizing solution, grows exponentially in problem size and problem graph degree.
These problems may be alleviated by increasing hardware connectivity or by recently proposed modifications to the QAOA that achieve higher performance with fewer circuit layers.
arXiv Detail & Related papers (2022-01-06T21:02:30Z) - Adaptive Fourier Neural Operators: Efficient Token Mixers for
Transformers [55.90468016961356]
We propose an efficient token mixer that learns to mix in the Fourier domain.
AFNO is based on a principled foundation of operator learning.
It can handle a sequence size of 65k and outperforms other efficient self-attention mechanisms.
arXiv Detail & Related papers (2021-11-24T05:44:31Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.