Is One Epoch All You Need For Multi-Fidelity Hyperparameter
Optimization?
- URL: http://arxiv.org/abs/2307.15422v2
- Date: Tue, 26 Sep 2023 07:08:36 GMT
- Title: Is One Epoch All You Need For Multi-Fidelity Hyperparameter
Optimization?
- Authors: Romain Egele, Isabelle Guyon, Yixuan Sun, Prasanna Balaprakash
- Abstract summary: Multi-fidelity HPO (MF-HPO) leverages intermediate accuracy levels in the learning process and discards low-performing models early on.
We compared various representative MF-HPO methods against a simple baseline on classical benchmark data.
This baseline achieved similar results to its counterparts, while requiring an order of magnitude less computation.
- Score: 17.21160278797221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperparameter optimization (HPO) is crucial for fine-tuning machine learning
models but can be computationally expensive. To reduce costs, Multi-fidelity
HPO (MF-HPO) leverages intermediate accuracy levels in the learning process and
discards low-performing models early on. We compared various representative
MF-HPO methods against a simple baseline on classical benchmark data. The
baseline involved discarding all models except the Top-K after training for
only one epoch, followed by further training to select the best model.
Surprisingly, this baseline achieved similar results to its counterparts, while
requiring an order of magnitude less computation. Upon analyzing the learning
curves of the benchmark data, we observed a few dominant learning curves, which
explained the success of our baseline. This suggests that researchers should
(1) always use the suggested baseline in benchmarks and (2) broaden the
diversity of MF-HPO benchmarks to include more complex cases.
Related papers
- Preference Learning Algorithms Do Not Learn Preference Rankings [62.335733662381884]
We show that most preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets.
We attribute this discrepancy to the DPO objective, which is empirically and theoretically ill-suited to fix even mild ranking errors.
arXiv Detail & Related papers (2024-05-29T21:29:44Z) - From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function [50.812404038684505]
Reinforcement Learning From Human Feedback (RLHF) has been a critical to the success of the latest generation of generative AI models.
Direct Preference Optimization (DPO) has emerged as an alternative approach.
DPO solves the same objective as the standard RLHF setup, but there is a mismatch between the two approaches.
arXiv Detail & Related papers (2024-04-18T17:37:02Z) - Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS)
We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution.
To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z) - Stabilizing Subject Transfer in EEG Classification with Divergence
Estimation [17.924276728038304]
We propose several graphical models to describe an EEG classification task.
We identify statistical relationships that should hold true in an idealized training scenario.
We design regularization penalties to enforce these relationships in two stages.
arXiv Detail & Related papers (2023-10-12T23:06:52Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference
in Low Resource Settings [6.463202903076821]
We compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited.
Early-Exit provides a better speed-accuracy trade-off due to the overhead of the Multi-Model approach.
We propose SWEET, an Early-Exit fine-tuning method that assigns each classifier its own set of unique model weights.
arXiv Detail & Related papers (2023-06-04T09:16:39Z) - Direct Preference Optimization: Your Language Model is Secretly a Reward
Model [126.78737228677025]
We introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form.
The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight.
Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods.
arXiv Detail & Related papers (2023-05-29T17:57:46Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Two-step hyperparameter optimization method: Accelerating hyperparameter
search by using a fraction of a training dataset [0.15420205433587747]
We present a two-step HPO method as a strategic solution to curbing computational demands and wait times.
We present our recent application of the two-step HPO method to the development of neural network emulators for aerosol activation.
arXiv Detail & Related papers (2023-02-08T02:38:26Z) - Multi-objective Asynchronous Successive Halving [10.632606255280649]
We propose algorithms that extend successive asynchronous halving (ASHA) to the multi-objective (MO) setting.
Our empirical analysis shows that MO ASHA enables to perform MO HPO at scale.
Our algorithms establish new baselines for future research in the area.
arXiv Detail & Related papers (2021-06-23T19:39:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.