Related papers: Python Wrapper for Simulating Multi-Fidelity Optimization on HPO Benchmarks without Any Wait

Python Wrapper for Simulating Multi-Fidelity Optimization on HPO Benchmarks without Any Wait

URL: http://arxiv.org/abs/2305.17595v2
Date: Thu, 29 Jun 2023 16:27:23 GMT
Title: Python Wrapper for Simulating Multi-Fidelity Optimization on HPO Benchmarks without Any Wait
Authors: Shuhei Watanabe
Abstract summary: We develop a Python wrapper that forces each worker to wait so that we yield exactly the same evaluation order as in the real experiment with only $10-2$ seconds of waiting instead of waiting several hours.
Score: 1.370633147306388
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hyperparameter (HP) optimization of deep learning (DL) is essential for high performance. As DL often requires several hours to days for its training, HP optimization (HPO) of DL is often prohibitively expensive. This boosted the emergence of tabular or surrogate benchmarks, which enable querying the (predictive) performance of DL with a specific HP configuration in a fraction. However, since the actual runtime of a DL training is significantly different from its query response time, simulators of an asynchronous HPO, e.g. multi-fidelity optimization, must wait for the actual runtime at each iteration in a na\"ive implementation; otherwise, the evaluation order during simulation does not match with the real experiment. To ease this issue, we developed a Python wrapper and describe its usage. This wrapper forces each worker to wait so that we yield exactly the same evaluation order as in the real experiment with only $10^{-2}$ seconds of waiting instead of waiting several hours. Our implementation is available at https://github.com/nabenabe0928/mfhpo-simulator/.

Related papers

GPRat: Gaussian Process Regression with Asynchronous Tasks [45.53402807796089]
We present a novel way of binding task-based C++ code built on the asynchronous runtime model HPX to a high-level Python API using pybind11. Compared to GPyTorch and GPflow, GPRat shows superior scaling on up to 64 cores on an AMD EPYC 7742 CPU for train- ing.
arXiv Detail & Related papers (2025-04-30T19:08:51Z)
SPILDL: A Scalable and Parallel Inductive Learner in Description Logic [0.0]
SPILDL is based on the DL-Learner (the state of the art in DL-based ILP learning) As a DL-based ILP learner, SPILDL targets the $mathcalALCQImathcal(D)$ DL language, and can learn DL hypotheses expressed as disjunctions of conjunctions. SPILDL employs a hybrid parallel approach which combines both shared-memory and distributed-memory approaches.
arXiv Detail & Related papers (2024-12-01T14:33:37Z)
Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks [40.8406006936244]
We introduce a Python package that facilitates efficient parallel HPO with zero-cost benchmarks. Our approach calculates the exact return order based on the information stored in file system. Our package can be installed via pip install mfhpo-simulator.
arXiv Detail & Related papers (2024-03-04T09:49:35Z)
Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures [56.200335252600354]
It is common practice to deploy pre-trained models on environments distinct from their native development settings. This led to the introduction of interchange formats such as ONNX, which includes its infrastructure, and ONNX, which work as standard formats.
arXiv Detail & Related papers (2024-02-21T09:18:44Z)
Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation [17.827002299991285]
Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. This paper presents a Parallel $Q$-Learning scheme that outperforms PPO in wall-clock time.
arXiv Detail & Related papers (2023-07-24T17:59:37Z)
Python Tool for Visualizing Variability of Pareto Fronts over Multiple Runs [1.370633147306388]
We develop a Python package for empirical attainment surface. The package is available at https://github.com/nabenabe0928/empirical-attainment-func.
arXiv Detail & Related papers (2023-05-15T17:59:34Z)
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time. PARTIME starts processing each data sample at the time in which it becomes available from the stream. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z)
Optimizing Data Collection in Deep Reinforcement Learning [4.9709347068704455]
GPU vectorization can achieve up to $1024times$ speedup over commonly used CPU simulators. We show that simulator kernel fusion speedups with a simple simulator are $11.3times$ and increase by up to $1024times$ as simulator complexity increases in terms of memory bandwidth requirements.
arXiv Detail & Related papers (2022-07-15T20:22:31Z)
Accelerated Quality-Diversity for Robotics through Massive Parallelism [4.260312058817663]
Policy evaluations are already commonly performed in parallel to speed up QD algorithms but have limited capabilities on a single machine. With recent advances in simulators that run on accelerators, thousands of evaluations can performed in parallel on single GPU/TPU. We show that QD algorithms are ideal candidates and can scale with massive parallelism to be run at interactive timescales.
arXiv Detail & Related papers (2022-02-02T19:44:17Z)
Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work. We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine. By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z)
Real-Time Execution of Large-scale Language Models on Mobile [49.32610509282623]
We find the best model structure of BERT for a given computation size to match specific devices. Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices. Specifically, our model is 5.2x faster on CPU and 4.1x faster on GPU with 0.5-2% accuracy loss compared with BERT-base.
arXiv Detail & Related papers (2020-09-15T01:59:17Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.