POCAII: Parameter Optimization with Conscious Allocation using Iterative Intelligence
- URL: http://arxiv.org/abs/2505.11745v1
- Date: Fri, 16 May 2025 23:05:07 GMT
- Title: POCAII: Parameter Optimization with Conscious Allocation using Iterative Intelligence
- Authors: Joshua Inman, Tanmay Khandait, Lalitha Sankar, Giulia Pedrielli,
- Abstract summary: POCAII is a flexible scheme for managing a hyper parameter optimization budget.<n>It shows superior performance in low-budget hyper parameter optimization regimes.<n>It has wide applications to real-world problems.
- Score: 4.478575931884855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we propose for the first time the hyperparameter optimization (HPO) algorithm POCAII. POCAII differs from the Hyperband and Successive Halving literature by explicitly separating the search and evaluation phases and utilizing principled approaches to exploration and exploitation principles during both phases. Such distinction results in a highly flexible scheme for managing a hyperparameter optimization budget by focusing on search (i.e., generating competing configurations) towards the start of the HPO process while increasing the evaluation effort as the HPO comes to an end. POCAII was compared to state of the art approaches SMAC, BOHB and DEHB. Our algorithm shows superior performance in low-budget hyperparameter optimization regimes. Since many practitioners do not have exhaustive resources to assign to HPO, it has wide applications to real-world problems. Moreover, the empirical evidence showed how POCAII demonstrates higher robustness and lower variance in the results. This is again very important when considering realistic scenarios with extremely expensive models to train.
Related papers
- Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO [51.22869332661607]
We decompose the performance gap between reinforcement learning from human feedback and direct preference optimization under a representation gap.<n>We show that RLHF, DPO, or online DPO can outperform one another depending on the type of model mis-specifications.
arXiv Detail & Related papers (2025-05-26T09:54:02Z) - ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning [50.53705050673944]
We propose ULTHO, an ultra-lightweight yet powerful framework for fast HPO in deep RL within single runs.<n>Specifically, we formulate the HPO process as a multi-armed bandit with clustered arms (MABC) and link it directly to long-term return optimization.<n>We test ULTHO on benchmarks including ALE, Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-03-08T07:03:43Z) - Grouped Sequential Optimization Strategy -- the Application of Hyperparameter Importance Assessment in Deep Learning [1.7778609937758323]
We implement a novel HPO strategy called 'Sequential Grouping'<n>Our experiments, validated across six additional image classification datasets, demonstrate that incorporating hyper parameter importance assessment (HIA) can significantly accelerate HPO without compromising model performance.
arXiv Detail & Related papers (2025-03-07T03:01:00Z) - HyperQ-Opt: Q-learning for Hyperparameter Optimization [0.0]
This paper presents a novel perspective on HPO by formulating it as a sequential decision-making problem and leveraging Q-learning, a reinforcement learning technique.<n>The approaches are evaluated for their ability to find optimal or near-optimal configurations within a limited number of trials.<n>By shifting the paradigm toward policy-based optimization, this work contributes to advancing HPO methods for scalable and efficient machine learning applications.
arXiv Detail & Related papers (2024-12-23T18:22:34Z) - Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL)
HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks.
Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z) - Accelerated Preference Optimization for Large Language Model Alignment [60.22606527763201]
Reinforcement Learning from Human Feedback (RLHF) has emerged as a pivotal tool for aligning large language models (LLMs) with human preferences.
Direct Preference Optimization (DPO) formulates RLHF as a policy optimization problem without explicitly estimating the reward function.
We propose a general Accelerated Preference Optimization (APO) framework, which unifies many existing preference optimization algorithms.
arXiv Detail & Related papers (2024-10-08T18:51:01Z) - PriorBand: Practical Hyperparameter Optimization in the Age of Deep
Learning [49.92394599459274]
We propose PriorBand, an HPO algorithm tailored to Deep Learning (DL) pipelines.
We show its robustness across a range of DL benchmarks and show its gains under informative expert input and against poor expert beliefs.
arXiv Detail & Related papers (2023-06-21T16:26:14Z) - Hyperparameters in Reinforcement Learning and How To Tune Them [25.782420501870295]
We show that hyper parameter choices in deep reinforcement learning can significantly affect the agent's final performance and sample efficiency.
We propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds.
We support this by comparing state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts.
arXiv Detail & Related papers (2023-06-02T07:48:18Z) - Two-step hyperparameter optimization method: Accelerating hyperparameter
search by using a fraction of a training dataset [0.15420205433587747]
We present a two-step HPO method as a strategic solution to curbing computational demands and wait times.
We present our recent application of the two-step HPO method to the development of neural network emulators for aerosol activation.
arXiv Detail & Related papers (2023-02-08T02:38:26Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Hyperparameter Optimization: Foundations, Algorithms, Best Practices and
Open Challenges [5.139260825952818]
This paper reviews important HPO methods such as grid or random search, evolutionary algorithms, Bayesian optimization, Hyperband and racing.
It gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with ML pipelines, runtime improvements, and parallelization.
arXiv Detail & Related papers (2021-07-13T04:55:47Z) - Cost-Efficient Online Hyperparameter Optimization [94.60924644778558]
We propose an online HPO algorithm that reaches human expert-level performance within a single run of the experiment.
Our proposed online HPO algorithm reaches human expert-level performance within a single run of the experiment, while incurring only modest computational overhead compared to regular training.
arXiv Detail & Related papers (2021-01-17T04:55:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.