Accelerated Quality-Diversity for Robotics through Massive Parallelism
- URL: http://arxiv.org/abs/2202.01258v1
- Date: Wed, 2 Feb 2022 19:44:17 GMT
- Title: Accelerated Quality-Diversity for Robotics through Massive Parallelism
- Authors: Bryan Lim, Maxime Allard, Luca Grillotti, Antoine Cully
- Abstract summary: Policy evaluations are already commonly performed in parallel to speed up QD algorithms but have limited capabilities on a single machine.
With recent advances in simulators that run on accelerators, thousands of evaluations can performed in parallel on single GPU/TPU.
We show that QD algorithms are ideal candidates and can scale with massive parallelism to be run at interactive timescales.
- Score: 4.260312058817663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quality-Diversity (QD) algorithms are a well-known approach to generate large
collections of diverse and high-quality policies. However, QD algorithms are
also known to be data-inefficient, requiring large amounts of computational
resources and are slow when used in practice for robotics tasks. Policy
evaluations are already commonly performed in parallel to speed up QD
algorithms but have limited capabilities on a single machine as most physics
simulators run on CPUs. With recent advances in simulators that run on
accelerators, thousands of evaluations can performed in parallel on single
GPU/TPU. In this paper, we present QDax, an implementation of MAP-Elites which
leverages massive parallelism on accelerators to make QD algorithms more
accessible. We first demonstrate the improvements on the number of evaluations
per second that parallelism using accelerated simulators can offer. More
importantly, we show that QD algorithms are ideal candidates and can scale with
massive parallelism to be run at interactive timescales. The increase in
parallelism does not significantly affect the performance of QD algorithms,
while reducing experiment runtimes by two factors of magnitudes, turning days
of computation into minutes. These results show that QD can now benefit from
hardware acceleration, which contributed significantly to the bloom of deep
learning.
Related papers
- Lazy Qubit Reordering for Accelerating Parallel State-Vector-based Quantum Circuit Simulation [0.0]
Two quantum operation scheduling methods are proposed for quantum circuit simulation.
The proposed methods reduce all-to-all communication caused by qubit reordering.
We develop these methods tailored for two primary procedures in variational quantum eigensolver (VQE) simulation.
arXiv Detail & Related papers (2024-10-05T18:20:37Z) - Benchmarking Edge AI Platforms for High-Performance ML Inference [0.0]
Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions.
While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads can vary significantly.
We compare the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions.
arXiv Detail & Related papers (2024-09-23T08:27:27Z) - Automatic Task Parallelization of Dataflow Graphs in ML/DL models [0.0]
We present a Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs.
We generate readable and executable parallel Pytorch+Python code from input ML models in ONNX format.
Preliminary results on several ML graphs demonstrate up to 1.9$times$ speedup over serial execution.
arXiv Detail & Related papers (2023-08-22T04:54:30Z) - Performance and Energy Consumption of Parallel Machine Learning
Algorithms [0.0]
Machine learning models have achieved remarkable success in various real-world applications.
Model training in machine learning requires large-scale data sets and multiple iterations before it can work properly.
Parallelization of training algorithms is a common strategy to speed up the process of training.
arXiv Detail & Related papers (2023-05-01T13:04:39Z) - PARTIME: Scalable and Parallel Processing Over Time with Deep Neural
Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time.
PARTIME starts processing each data sample at the time in which it becomes available from the stream.
Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z) - Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications.
We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z) - Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for
5G and Beyond [70.81551587109833]
nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity.
One of the main challenges comes from the real-time implementation of these algorithms.
This paper explores the acceleration of APSM-based algorithms through massive parallelization.
arXiv Detail & Related papers (2022-01-13T15:20:45Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms [1.3249453757295084]
We study training algorithms for deep learning on heterogeneous CPU+GPU architectures.
Our two-fold objective -- maximize convergence rate and resource utilization simultaneously -- makes the problem challenging.
We show that the implementation of these algorithms achieves both faster convergence and higher resource utilization than on several real datasets.
arXiv Detail & Related papers (2020-04-19T05:21:20Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.