Related papers: Interim Report on Human-Guided Adaptive Hyperparameter Optimization with Multi-Fidelity Sprints

Interim Report on Human-Guided Adaptive Hyperparameter Optimization with Multi-Fidelity Sprints

URL: http://arxiv.org/abs/2505.09792v1
Date: Wed, 14 May 2025 20:38:44 GMT
Title: Interim Report on Human-Guided Adaptive Hyperparameter Optimization with Multi-Fidelity Sprints
Authors: Michael Kamfonas,
Abstract summary: This case study applies a phased hyperparameter optimization process to compare multitask natural language model variants.<n>We employ short, Bayesian optimization sessions that leverage multi-fidelity, hyperparameter space pruning, progressive halving, and a degree of human guidance.<n>We demonstrate our method on a collection of variants of the 2021 Joint Entity and Relation Extraction model proposed by Eberts and Ulges.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This case study applies a phased hyperparameter optimization process to compare multitask natural language model variants that utilize multiphase learning rate scheduling and optimizer parameter grouping. We employ short, Bayesian optimization sessions that leverage multi-fidelity, hyperparameter space pruning, progressive halving, and a degree of human guidance. We utilize the Optuna TPE sampler and Hyperband pruner, as well as the Scikit-Learn Gaussian process minimization. Initially, we use efficient low-fidelity sprints to prune the hyperparameter space. Subsequent sprints progressively increase their model fidelity and employ hyperband pruning for efficiency. A second aspect of our approach is using a meta-learner to tune threshold values to resolve classification probabilities during inference. We demonstrate our method on a collection of variants of the 2021 Joint Entity and Relation Extraction model proposed by Eberts and Ulges.

Related papers

Tune My Adam, Please! [42.01711296068661]
We propose Adam-PFN, a new surrogate model for Freeze-thaw BO of Adam's hyperparameters, pre-trained on learning curves from TaskSet.<n>Our approach improves both learning curve augmentation and hyperparameter optimization on TaskSet evaluation tasks, with strong performance on out-of-distribution tasks.
arXiv Detail & Related papers (2025-08-27T09:57:45Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
Improving Hyperparameter Learning under Approximate Inference in Gaussian Process Models [18.134776677795077]
We focus on the interplay between variational inference (VI) and the learning target. We design a hybrid training procedure to bring the best of both worlds: it leverages conjugate-computation VI for inference. We empirically demonstrate the effectiveness of our proposal across a wide range of data sets.
arXiv Detail & Related papers (2023-06-07T07:15:08Z)
Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z)
Optimizing Training Trajectories in Variational Autoencoders via Latent Bayesian Optimization Approach [0.0]
Unsupervised and semi-supervised ML methods have become widely adopted across multiple areas of physics, chemistry, and materials sciences. We propose a latent Bayesian optimization (zBO) approach for the hyper parameter trajectory optimization for the unsupervised and semi-supervised ML. We demonstrate an application of this method for finding joint discrete and continuous rotationally invariant representations for MNIST and experimental data of a plasmonic nanoparticles material system.
arXiv Detail & Related papers (2022-06-30T23:41:47Z)
Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction. Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z)
Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation [0.0]
We develop an approximate hypergradient-based hyper parameter optimiser. It requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient.
arXiv Detail & Related papers (2021-10-20T09:57:57Z)
Pre-trained Gaussian Processes for Bayesian Optimization [24.730678780782647]
We propose a new pre-training based BO framework named HyperBO. We show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known.
arXiv Detail & Related papers (2021-09-16T20:46:26Z)
Amortized Auto-Tuning: Cost-Efficient Transfer Optimization for Hyperparameter Recommendation [83.85021205445662]
We propose an instantiation--amortized auto-tuning (AT2) to speed up tuning of machine learning models. We conduct a thorough analysis of the multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2)
arXiv Detail & Related papers (2021-06-17T00:01:18Z)
Hyperboost: Hyperparameter Optimization by Gradient Boosting surrogate models [0.4079265319364249]
Current state-of-the-art methods leverage Random Forests or Gaussian processes to build a surrogate model. We propose a new surrogate model based on gradient boosting. We demonstrate empirically that the new method is able to outperform some state-of-the art techniques across a reasonable sized set of classification problems.
arXiv Detail & Related papers (2021-01-06T22:07:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.