Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
- URL: http://arxiv.org/abs/2411.05614v1
- Date: Fri, 08 Nov 2024 14:55:32 GMT
- Title: Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
- Authors: Zhihong Liu, Xin Xu, Peng Qiao, Dongsheng Li,
- Abstract summary: Deep reinforcement learning has led to dramatic breakthroughs in the field of artificial intelligence.
As the amount of rollout experience data and the size of neural networks for deep reinforcement learning have grown, handling the training process and reducing the time consumption using parallel and distributed computing is becoming an urgent and essential desire.
We perform a broad and thorough investigation on training acceleration methodologies for deep reinforcement learning based on parallel and distributed computing.
- Score: 29.275928499337734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning has led to dramatic breakthroughs in the field of artificial intelligence for the past few years. As the amount of rollout experience data and the size of neural networks for deep reinforcement learning have grown continuously, handling the training process and reducing the time consumption using parallel and distributed computing is becoming an urgent and essential desire. In this paper, we perform a broad and thorough investigation on training acceleration methodologies for deep reinforcement learning based on parallel and distributed computing, providing a comprehensive survey in this field with state-of-the-art methods and pointers to core references. In particular, a taxonomy of literature is provided, along with a discussion of emerging topics and open issues. This incorporates learning system architectures, simulation parallelism, computing parallelism, distributed synchronization mechanisms, and deep evolutionary reinforcement learning. Further, we compare 16 current open-source libraries and platforms with criteria of facilitating rapid development. Finally, we extrapolate future directions that deserve further research.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - State-Space Modeling in Long Sequence Processing: A Survey on Recurrence in the Transformer Era [59.279784235147254]
This survey provides an in-depth summary of the latest approaches that are based on recurrent models for sequential data processing.
The emerging picture suggests that there is room for thinking of novel routes, constituted by learning algorithms which depart from the standard Backpropagation Through Time.
arXiv Detail & Related papers (2024-06-13T12:51:22Z) - A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings [1.0589208420411014]
This survey explores the landscape of distributed learning, encompassing cloud and edge settings.
We delve into the core concepts of data and model parallelism, examining how models are partitioned across different dimensions and layers to optimize resource utilization and performance.
We analyze various partitioning schemes for different layer types, including fully connected, convolutional, and recurrent layers, highlighting the trade-offs between computational efficiency, communication overhead, and memory constraints.
arXiv Detail & Related papers (2024-05-23T22:00:38Z) - Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey [43.57122822150023]
This article surveys the literature on algorithms and technologies aimed at achieving efficient communication in large-scale distributed deep learning.
We first introduce efficient algorithms for model synchronization and communication data compression in the context of large-scale distributed training.
Next, we introduce efficient strategies related to resource allocation and task scheduling for use in distributed training and inference.
arXiv Detail & Related papers (2024-04-09T08:35:04Z) - Privacy-Preserving Serverless Edge Learning with Decentralized Small
Data [13.254530176359182]
Distributed training strategies have recently become a promising approach to ensure data privacy when training deep models.
This paper extends conventional serverless platforms with serverless edge learning architectures and provides an efficient distributed training framework from the networking perspective.
arXiv Detail & Related papers (2021-11-29T21:04:49Z) - Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation.
We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Fiber: A Platform for Efficient Development and Distributed Training for
Reinforcement Learning and Population-Based Methods [15.263746839710715]
Reinforcement learning (RL) and population-based methods pose unique challenges for efficiency and flexibility to the underlying distributed computing frameworks.
We introduce Fiber, a scalable distributed computing framework for RL and population-based methods.
arXiv Detail & Related papers (2020-03-25T00:28:48Z) - Communication-Efficient Distributed Deep Learning: A Comprehensive
Survey [22.42450750097714]
We provide a comprehensive survey of the communication-efficient distributed training algorithms.
We first propose a taxonomy of data-parallel distributed training algorithms.
We then investigate state-of-the-art studies that address problems in these four dimensions.
arXiv Detail & Related papers (2020-03-10T05:42:44Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z) - Distributed Learning in the Non-Convex World: From Batch to Streaming
Data, and Beyond [73.03743482037378]
Distributed learning has become a critical direction of the massively connected world envisioned by many.
This article discusses four key elements of scalable distributed processing and real-time data computation problems.
Practical issues and future research will also be discussed.
arXiv Detail & Related papers (2020-01-14T14:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.