Related papers: STANNIS: Low-Power Acceleration of Deep Neural Network Training Using Computational Storage

STANNIS: Low-Power Acceleration of Deep Neural Network Training Using Computational Storage

URL: http://arxiv.org/abs/2002.07215v2
Date: Wed, 19 Feb 2020 18:56:52 GMT
Title: STANNIS: Low-Power Acceleration of Deep Neural Network Training Using Computational Storage
Authors: Ali HeydariGorji, Mahdi Torabzadehkashi, Siavash Rezaei, Hossein Bobarshad, Vladimir Alves, Pai H. Chou
Abstract summary: This paper proposes a framework for distributed, in-storage training of neural networks on clusters of computational storage devices. Such devices not only contain hardware accelerators but also eliminate data movement between the host and storage, resulting in both improved performance and power savings.
Score: 1.4680035572775534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a framework for distributed, in-storage training of neural networks on clusters of computational storage devices. Such devices not only contain hardware accelerators but also eliminate data movement between the host and storage, resulting in both improved performance and power savings. More importantly, this in-storage processing style of training ensures that private data never leaves the storage while fully controlling the sharing of public data. Experimental results show up to 2.7x speedup and 69% reduction in energy consumption and no significant loss in accuracy.

Related papers

Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks [60.54852710216738]
We introduce a novel digital twin-assisted optimization framework, called D-REC, to ensure reliable caching in nextG wireless networks. By incorporating reliability modules into a constrained decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints.
arXiv Detail & Related papers (2024-06-29T02:40:28Z)
NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes [50.00272243518593]
Deep neural networks (DNNs) have become ubiquitous in machine learning, but their energy consumption remains problematically high. We have developed NeuralFuse, a novel add-on module that handles the energy-accuracy tradeoff in low-voltage regimes. At a 1% bit-error rate, NeuralFuse can reduce access energy by up to 24% while recovering accuracy by up to 57%.
arXiv Detail & Related papers (2023-06-29T11:38:22Z)
Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings [113.38884267189871]
Training and inference on edge devices often requires an efficient setup due to computational limitations. Pre-computing data representations and caching them on a server can mitigate extensive edge device computation. We propose a simple, yet effective approach that uses randomly hyperplane projections. We show that the embeddings remain effective for training models across various English and German sentence classification tasks that retain 94%--99% of their floating-point.
arXiv Detail & Related papers (2023-03-13T10:53:00Z)
RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation Based Private Inference [17.299835585861747]
We introduce RRNet, a framework that aims to jointly reduce the overhead of MPC comparison protocols and accelerate computation through hardware acceleration. Our approach integrates the hardware latency of cryptographic building blocks into the DNN loss function, resulting in improved energy efficiency, accuracy, and security guarantees.
arXiv Detail & Related papers (2023-02-05T04:02:13Z)
DIVISION: Memory Efficient Training via Dual Activation Precision [60.153754740511864]
State-of-the-art work combines a search of quantization bit-width with the training, which makes the procedure complicated and less transparent. We propose a simple and effective method to compress DNN training. Experiment results show DIVISION has better comprehensive performance than state-of-the-art methods, including over 10x compression of activation maps and competitive training throughput, without loss of model accuracy.
arXiv Detail & Related papers (2022-08-05T03:15:28Z)
Generalized Key-Value Memory to Flexibly Adjust Redundancy in Memory-Augmented Networks [6.03025980398201]
Memory-augmented neural networks enhance a neural network with an external key-value memory. We propose a generalized key-value memory that decouples its dimension from the number of support vectors. We show that adapting this parameter on demand effectively mitigates up to 44% nonidealities, at equal accuracy and number of devices.
arXiv Detail & Related papers (2022-03-11T19:59:43Z)
Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled. This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit. We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
Binarized Neural Architecture Search for Efficient Object Recognition [120.23378346337311]
Binarized neural architecture search (BNAS) produces extremely compressed models to reduce huge computational cost on embedded devices for edge computing. An accuracy of $96.53%$ vs. $97.22%$ is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a $40%$ faster search than the state-of-the-art PC-DARTS.
arXiv Detail & Related papers (2020-09-08T15:51:23Z)
HyperTune: Dynamic Hyperparameter Tuning For Efficient Distribution of DNN Training Over Heterogeneous Systems [1.4680035572775532]
This paper describes distributed training of Deep Neural Networks (DNN) on computational storage devices (CSD) A CSD-based distributed architecture incorporates the advantages of federated learning in terms of performance scalability, resiliency, and data privacy. The paper also describes Stannis, a DNN training framework that improves on the shortcomings of existing distributed training frameworks.
arXiv Detail & Related papers (2020-07-16T02:12:44Z)
Efficient Memory Management for Deep Neural Net Inference [0.0]
Deep neural net inference can now be moved to mobile and embedded devices, desired for various reasons ranging from latency to privacy. These devices are not only limited by their compute power and battery, but also by their inferior physical memory and cache, and thus, an efficient memory manager becomes a crucial component for deep neural net inference at the edge.
arXiv Detail & Related papers (2020-01-10T02:45:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.