Using Large Language Models for Hyperparameter Optimization
- URL: http://arxiv.org/abs/2312.04528v1
- Date: Thu, 7 Dec 2023 18:46:50 GMT
- Title: Using Large Language Models for Hyperparameter Optimization
- Authors: Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy
Ba
- Abstract summary: This paper studies using foundational large language models (LLMs) to make decisions during hyper parameter optimization (HPO)
Empirical evaluations demonstrate that in settings with constrained search budgets, LLMs can perform comparably or better than traditional HPO methods.
- Score: 31.537306578628556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies using foundational large language models (LLMs) to make
decisions during hyperparameter optimization (HPO). Empirical evaluations
demonstrate that in settings with constrained search budgets, LLMs can perform
comparably or better than traditional HPO methods like random search and
Bayesian optimization on standard benchmarks. Furthermore, we propose to treat
the code specifying our model as a hyperparameter, which the LLM outputs, going
beyond the capabilities of existing HPO approaches. Our findings suggest that
LLMs are a promising tool for improving efficiency in the traditional
decision-making problem of hyperparameter optimization.
Related papers
- In-the-loop Hyper-Parameter Optimization for LLM-Based Automated Design of Heuristics [0.020482269513546456]
Large Language Models (LLMs) have shown great potential in automatically generating and optimizing (meta)heuristics.
This paper presents a novel hybrid approach, LLaMEA-HPO, which integrates an open source LLaMEA framework with a Hyper- Evolutionary Optimization (HPO) procedure in the loop.
arXiv Detail & Related papers (2024-10-07T14:04:31Z) - Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models.
The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models.
Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z) - Large Language Model-Based Evolutionary Optimizer: Reasoning with
elitism [1.1463861912335864]
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities.
This paper asserts that LLMs possess the capability for zero-shot optimization across diverse scenarios.
We introduce a novel population-based method for numerical optimization using LLMs.
arXiv Detail & Related papers (2024-03-04T13:57:37Z) - Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.
We identify two pivotal factors in model parameter learning: update direction and update method.
In particular, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies.
arXiv Detail & Related papers (2024-02-27T15:05:32Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z) - Enhancing Explainability of Hyperparameter Optimization via Bayesian
Algorithm Execution [13.037647287689438]
We study the combination of HPO with interpretable machine learning (IML) methods such as partial dependence plots.
We propose a modified HPO method which efficiently searches for optimum global predictive performance.
Our method returns more reliable explanations of the underlying black-box without a loss of optimization performance.
arXiv Detail & Related papers (2022-06-11T07:12:04Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - A Comparative study of Hyper-Parameter Optimization Tools [2.6097538974670935]
We compare the performance of four python libraries, namely Optuna, Hyperopt, Optunity, and sequential model algorithm configuration (SMAC)
We found that Optuna has better performance for CASH problem and NeurIPS black-box optimization challenge.
arXiv Detail & Related papers (2022-01-17T14:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.