Related papers: Accelerated Quality-Diversity for Robotics through Massive Parallelism

Accelerated Quality-Diversity for Robotics through Massive Parallelism

URL: http://arxiv.org/abs/2202.01258v1
Date: Wed, 2 Feb 2022 19:44:17 GMT
Title: Accelerated Quality-Diversity for Robotics through Massive Parallelism
Authors: Bryan Lim, Maxime Allard, Luca Grillotti, Antoine Cully
Abstract summary: Policy evaluations are already commonly performed in parallel to speed up QD algorithms but have limited capabilities on a single machine. With recent advances in simulators that run on accelerators, thousands of evaluations can performed in parallel on single GPU/TPU. We show that QD algorithms are ideal candidates and can scale with massive parallelism to be run at interactive timescales.
Score: 4.260312058817663
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quality-Diversity (QD) algorithms are a well-known approach to generate large collections of diverse and high-quality policies. However, QD algorithms are also known to be data-inefficient, requiring large amounts of computational resources and are slow when used in practice for robotics tasks. Policy evaluations are already commonly performed in parallel to speed up QD algorithms but have limited capabilities on a single machine as most physics simulators run on CPUs. With recent advances in simulators that run on accelerators, thousands of evaluations can performed in parallel on single GPU/TPU. In this paper, we present QDax, an implementation of MAP-Elites which leverages massive parallelism on accelerators to make QD algorithms more accessible. We first demonstrate the improvements on the number of evaluations per second that parallelism using accelerated simulators can offer. More importantly, we show that QD algorithms are ideal candidates and can scale with massive parallelism to be run at interactive timescales. The increase in parallelism does not significantly affect the performance of QD algorithms, while reducing experiment runtimes by two factors of magnitudes, turning days of computation into minutes. These results show that QD can now benefit from hardware acceleration, which contributed significantly to the bloom of deep learning.

Related papers

LuGo: an Enhanced Quantum Phase Estimation Implementation [2.45000454920926]
We introduce LuGo, a novel framework designed to enhance the performance of Quantum Phase Estimation. LuGo achieves significant improvements in both computational efficiency and hardware requirements. With these advantages, LuGo paves the way for more efficient implementations of QPE, enabling broader applications across several quantum computing domains.
arXiv Detail & Related papers (2025-03-19T17:19:24Z)
Pushing the Boundary of Quantum Advantage in Hard Combinatorial Optimization with Probabilistic Computers [0.4969640751053581]
We show that p-computers can surpass state-of-the-art quantum annealers in solving hard optimization problems. We show that these algorithms are readily implementable in modern hardware thanks to the mature semiconductor technology. Our results raise the bar for a practical quantum advantage in optimization and present p-computers as scalable, energy-efficient hardware.
arXiv Detail & Related papers (2025-03-13T12:24:13Z)
Parallelizing the stabilizer formalism for quantum machine learning applications [0.4749824105387292]
The proposal implementation on Python is faster than Qiskit, the current simulator, 4.23 times in the case of 4-qubits, 60,2K gates. The results show that the proposal implementation on Python is faster than Qiskit, the current simulator, 4.23 times in the case of 4-qubits, 60,2K gates.
arXiv Detail & Related papers (2025-02-15T06:10:07Z)
Lazy Qubit Reordering for Accelerating Parallel State-Vector-based Quantum Circuit Simulation [0.0]
Two quantum operation scheduling methods are proposed for quantum circuit simulation. The proposed methods reduce all-to-all communication caused by qubit reordering. We develop these methods tailored for two primary procedures in variational quantum eigensolver (VQE) simulation.
arXiv Detail & Related papers (2024-10-05T18:20:37Z)
Benchmarking Edge AI Platforms for High-Performance ML Inference [0.0]
Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads can vary significantly. We compare the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions.
arXiv Detail & Related papers (2024-09-23T08:27:27Z)
Automatic Task Parallelization of Dataflow Graphs in ML/DL models [0.0]
We present a Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs. We generate readable and executable parallel Pytorch+Python code from input ML models in ONNX format. Preliminary results on several ML graphs demonstrate up to 1.9$times$ speedup over serial execution.
arXiv Detail & Related papers (2023-08-22T04:54:30Z)
Performance and Energy Consumption of Parallel Machine Learning Algorithms [0.0]
Machine learning models have achieved remarkable success in various real-world applications. Model training in machine learning requires large-scale data sets and multiple iterations before it can work properly. Parallelization of training algorithms is a common strategy to speed up the process of training.
arXiv Detail & Related papers (2023-05-01T13:04:39Z)
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time. PARTIME starts processing each data sample at the time in which it becomes available from the stream. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Real-Time GPU-Accelerated Machine Learning Based Multiuser Detection for 5G and Beyond [70.81551587109833]
nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity. One of the main challenges comes from the real-time implementation of these algorithms. This paper explores the acceleration of APSM-based algorithms through massive parallelization.
arXiv Detail & Related papers (2022-01-13T15:20:45Z)
Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls. We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms [1.3249453757295084]
We study training algorithms for deep learning on heterogeneous CPU+GPU architectures. Our two-fold objective -- maximize convergence rate and resource utilization simultaneously -- makes the problem challenging. We show that the implementation of these algorithms achieves both faster convergence and higher resource utilization than on several real datasets.
arXiv Detail & Related papers (2020-04-19T05:21:20Z)
Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning. We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both. Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.