Multi-GPU SNN Simulation with Perfect Static Load Balancing
- URL: http://arxiv.org/abs/2102.04681v1
- Date: Tue, 9 Feb 2021 07:07:34 GMT
- Title: Multi-GPU SNN Simulation with Perfect Static Load Balancing
- Authors: Dennis Bautembach, Iason Oikonomidis, Antonis Argyros
- Abstract summary: We present a SNN simulator which scales to millions of neurons, billions of synapses, and 8 GPUs.
This is made possible by 1) a novel, cache-aware spike transmission algorithm 2) a model parallel multi- GPU distribution scheme and 3) a static, yet very effective load balancing strategy.
- Score: 0.8360870648463651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a SNN simulator which scales to millions of neurons, billions of
synapses, and 8 GPUs. This is made possible by 1) a novel, cache-aware spike
transmission algorithm 2) a model parallel multi-GPU distribution scheme and 3)
a static, yet very effective load balancing strategy. The simulator further
features an easy to use API and the ability to create custom models. We compare
the proposed simulator against two state of the art ones on a series of
benchmarks using three well-established models. We find that our simulator is
faster, consumes less memory, and scales linearly with the number of GPUs.
Related papers
- ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI [27.00155119759743]
ManiSkill3 is the fastest state-visual GPU parallelized robotics simulator with contact-rich physics targeting generalizable manipulation.
ManiSkill3 supports GPU parallelization of many aspects including simulation+rendering, heterogeneous simulation, pointclouds/voxels visual input, and more.
arXiv Detail & Related papers (2024-10-01T06:10:39Z) - Optimizing Data Collection in Deep Reinforcement Learning [4.9709347068704455]
GPU vectorization can achieve up to $1024times$ speedup over commonly used CPU simulators.
We show that simulator kernel fusion speedups with a simple simulator are $11.3times$ and increase by up to $1024times$ as simulator complexity increases in terms of memory bandwidth requirements.
arXiv Detail & Related papers (2022-07-15T20:22:31Z) - PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning.
However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware.
PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z) - TensorLy-Quantum: Quantum Machine Learning with Tensor Methods [67.29221827422164]
We create a Python library for quantum circuit simulation that adopts the PyTorch API.
Ly-Quantum can scale to hundreds of qubits on a single GPU and thousands of qubits on multiple GPU.
arXiv Detail & Related papers (2021-12-19T19:26:17Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based
Memory Management [19.341284825473558]
Pre-trained model (PTM) is revolutionizing Artificial intelligence (AI) technology.
PTM learns a model with general language features on the vast text and then fine-tunes the model using a task-specific dataset.
PatrickStar reduces memory requirements of computing platforms by using heterogeneous memory space.
arXiv Detail & Related papers (2021-08-12T15:58:12Z) - Megaverse: Simulating Embodied Agents at One Million Experiences per
Second [75.1191260838366]
We present Megaverse, a new 3D simulation platform for reinforcement learning and embodied AI research.
Megaverse is up to 70x faster than DeepMind Lab in fully-shaded 3D scenes with interactive objects.
We use Megaverse to build a new benchmark that consists of several single-agent and multi-agent tasks.
arXiv Detail & Related papers (2021-07-17T03:16:25Z) - BayesSimIG: Scalable Parameter Inference for Adaptive Domain
Randomization with IsaacGym [59.53949960353792]
BayesSimIG is a library that provides an implementation of BayesSim integrated with the recently released NVIDIA IsaacGym.
BayesSimIG provides an integration with NVIDIABoard to easily visualize slices of high-dimensional posteriors.
arXiv Detail & Related papers (2021-07-09T16:21:31Z) - Even Faster SNN Simulation with Lazy+Event-driven Plasticity and Shared
Atomics [0.8360870648463651]
We present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators.
The first one targets spike timing dependent plasticity (STDP) and efficiently facilitates the computation of pre- and post-synaptic spikes.
The second optimization targets spike delivery. We partition our graph representation in a way that the number of neurons that need to be updated at any given time.
arXiv Detail & Related papers (2021-07-08T20:13:54Z) - Efficient Large-Scale Language Model Training on GPU Clusters [19.00915720435389]
Large language models have led to state-of-the-art accuracies across a range of tasks.
Memory capacity is limited, making it impossible to fit large models on a single GPU.
The number of compute operations required to train these models can result in unrealistically long training times.
arXiv Detail & Related papers (2021-04-09T16:43:11Z) - Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work.
We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine.
By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.