Is One Epoch All You Need For Multi-Fidelity Hyperparameter
Optimization?
- URL: http://arxiv.org/abs/2307.15422v2
- Date: Tue, 26 Sep 2023 07:08:36 GMT
- Title: Is One Epoch All You Need For Multi-Fidelity Hyperparameter
Optimization?
- Authors: Romain Egele, Isabelle Guyon, Yixuan Sun, Prasanna Balaprakash
- Abstract summary: Multi-fidelity HPO (MF-HPO) leverages intermediate accuracy levels in the learning process and discards low-performing models early on.
We compared various representative MF-HPO methods against a simple baseline on classical benchmark data.
This baseline achieved similar results to its counterparts, while requiring an order of magnitude less computation.
- Score: 17.21160278797221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperparameter optimization (HPO) is crucial for fine-tuning machine learning
models but can be computationally expensive. To reduce costs, Multi-fidelity
HPO (MF-HPO) leverages intermediate accuracy levels in the learning process and
discards low-performing models early on. We compared various representative
MF-HPO methods against a simple baseline on classical benchmark data. The
baseline involved discarding all models except the Top-K after training for
only one epoch, followed by further training to select the best model.
Surprisingly, this baseline achieved similar results to its counterparts, while
requiring an order of magnitude less computation. Upon analyzing the learning
curves of the benchmark data, we observed a few dominant learning curves, which
explained the success of our baseline. This suggests that researchers should
(1) always use the suggested baseline in benchmarks and (2) broaden the
diversity of MF-HPO benchmarks to include more complex cases.
Related papers
- Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback [64.67540769692074]
Large language models (LLMs) fine-tuned with alignment techniques, such as reinforcement learning from human feedback, have been instrumental in developing some of the most capable AI systems to date.
We introduce an approach called Margin Matching Preference Optimization (MMPO), which incorporates relative quality margins into optimization, leading to improved LLM policies and reward models.
Experiments with both human and AI feedback data demonstrate that MMPO consistently outperforms baseline methods, often by a substantial margin, on popular benchmarks including MT-bench and RewardBench.
arXiv Detail & Related papers (2024-10-04T04:56:11Z) - Preference Learning Algorithms Do Not Learn Preference Rankings [62.335733662381884]
We study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs.
We find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets.
arXiv Detail & Related papers (2024-05-29T21:29:44Z) - From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function [50.812404038684505]
We show that we can derive DPO in the token-level MDP as a general inverse Q-learning algorithm, which satisfies the Bellman equation.
We discuss applications of our work, including information elicitation in multi-turn dialogue, reasoning, agentic applications and end-to-end training of multi-model systems.
arXiv Detail & Related papers (2024-04-18T17:37:02Z) - Stabilizing Subject Transfer in EEG Classification with Divergence
Estimation [17.924276728038304]
We propose several graphical models to describe an EEG classification task.
We identify statistical relationships that should hold true in an idealized training scenario.
We design regularization penalties to enforce these relationships in two stages.
arXiv Detail & Related papers (2023-10-12T23:06:52Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
in Low Resource Settings [6.463202903076821]
We compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited.
Early-Exit provides a better speed-accuracy trade-off due to the overhead of the Multi-Model approach.
We propose SWEET, an Early-Exit fine-tuning method that assigns each classifier its own set of unique model weights.
arXiv Detail & Related papers (2023-06-04T09:16:39Z) - Direct Preference Optimization: Your Language Model is Secretly a Reward Model [119.65409513119963]
We introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form.
The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight.
Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods.
arXiv Detail & Related papers (2023-05-29T17:57:46Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Two-step hyperparameter optimization method: Accelerating hyperparameter
search by using a fraction of a training dataset [0.15420205433587747]
We present a two-step HPO method as a strategic solution to curbing computational demands and wait times.
We present our recent application of the two-step HPO method to the development of neural network emulators for aerosol activation.
arXiv Detail & Related papers (2023-02-08T02:38:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.