Related papers: How to Train Your Neural Network: A Comparative Evaluation

How to Train Your Neural Network: A Comparative Evaluation

URL: http://arxiv.org/abs/2111.04949v1
Date: Tue, 9 Nov 2021 04:24:42 GMT
Title: How to Train Your Neural Network: A Comparative Evaluation
Authors: Shu-Huai Lin, Daniel Nichols, Siddharth Singh, Abhinav Bhatele
Abstract summary: We discuss and compare current state-of-the-art frameworks for large scale distributed deep learning. We present empirical results comparing their performance on large image and language training tasks. Based on our results, we discuss algorithmic and implementation portions of each framework which hinder performance.
Score: 1.3654846342364304
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields. This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators. In this paper, we discuss and compare current state-of-the-art frameworks for large scale distributed deep learning. First, we survey current practices in distributed learning and identify the different types of parallelism used. Then, we present empirical results comparing their performance on large image and language training tasks. Additionally, we address their statistical efficiency and memory consumption behavior. Based on our results, we discuss algorithmic and implementation portions of each framework which hinder performance.

Related papers

Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Deep Learning for Causal Inference: A Comparison of Architectures for Heterogeneous Treatment Effect Estimation [0.5249805590164902]
We develop a fully connected neural network implementation of the popular Bayesian Causal Forest algorithm. We apply our method to a dataset examining the effect of stress on sleep.
arXiv Detail & Related papers (2024-05-06T02:54:53Z)
The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by Isolating Task-Specific Subnetworks in Feedforward Neural Networks [0.0]
We identify a methodology and network representational structure which allows a pruned network to employ previously unused weights to learn subsequent tasks. We show that networks trained using our approaches are able to learn multiple tasks, which may be related or unrelated, in parallel or in sequence without sacrificing performance on any task or exhibiting catastrophic forgetting.
arXiv Detail & Related papers (2022-07-18T15:07:13Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Collaborative Method for Incremental Learning on Classification and Generation [32.07222897378187]
We introduce a novel algorithm, Incremental Class Learning with Attribute Sharing (ICLAS), for incremental class learning with deep neural networks. As one of its component, incGAN, can generate images with increased variety compared with the training data. Under challenging environment of data deficiency, ICLAS incrementally trains classification and the generation networks.
arXiv Detail & Related papers (2020-10-29T06:34:53Z)
Exploring Flip Flop memories and beyond: training recurrent neural networks with key insights [0.0]
We study the implementation of a temporal processing task, specifically a 3-bit Flip Flop memory. The obtained networks are meticulously analyzed to elucidate dynamics, aided by an array of visualization and analysis tools.
arXiv Detail & Related papers (2020-10-15T16:25:29Z)
Reservoir Memory Machines as Neural Computers [70.5993855765376]
Differentiable neural computers extend artificial neural networks with an explicit memory without interference. We achieve some of the computational capabilities of differentiable neural computers with a model that can be trained very efficiently.
arXiv Detail & Related papers (2020-09-14T12:01:30Z)
Spiking Neural Networks Hardware Implementations and Challenges: a Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles. We present the state of the art of hardware implementations of spiking neural networks. We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.