Related papers: LMEMs for post-hoc analysis of HPO Benchmarking

LMEMs for post-hoc analysis of HPO Benchmarking

URL: http://arxiv.org/abs/2408.02533v1
Date: Mon, 5 Aug 2024 15:03:19 GMT
Title: LMEMs for post-hoc analysis of HPO Benchmarking
Authors: Anton Geburek, Neeratyoy Mallik, Danny Stoll, Xavier Bouthillier, Frank Hutter,
Abstract summary: We apply Linear Mixed-Effect Models-based (LMEMs) significance testing for post-hoc analysis of HPO benchmarking runs. LMEMs allow flexible and expressive modeling on the entire experiment data, including information such as benchmark meta-features. We demonstrate this through a case study on the PriorBand paper's experiment data to find insights not reported in the original work.
Score: 38.39259273088395
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The importance of tuning hyperparameters in Machine Learning (ML) and Deep Learning (DL) is established through empirical research and applications, evident from the increase in new hyperparameter optimization (HPO) algorithms and benchmarks steadily added by the community. However, current benchmarking practices using averaged performance across many datasets may obscure key differences between HPO methods, especially for pairwise comparisons. In this work, we apply Linear Mixed-Effect Models-based (LMEMs) significance testing for post-hoc analysis of HPO benchmarking runs. LMEMs allow flexible and expressive modeling on the entire experiment data, including information such as benchmark meta-features, offering deeper insights than current analysis practices. We demonstrate this through a case study on the PriorBand paper's experiment data to find insights not reported in the original work.

Related papers

Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis [89.60263788590893]
Post-training Quantization (PTQ) technique has been extensively adopted for large language models (LLMs) compression. Existing algorithms focus primarily on performance, overlooking the trade-off among model size, performance, and quantization bitwidth. We provide a novel benchmark for LLMs PTQ in this paper.
arXiv Detail & Related papers (2025-02-18T07:35:35Z)
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning [35.62208317531141]
We advocate and introduce the unrolling paradigm, also referred to as "learning to optimize" Our unrolling approach covers various statistical feature distributions and pre-training paradigms. We report comprehensive experiments, which cover a breadth of fine-grained downstream image classification tasks.
arXiv Detail & Related papers (2024-12-21T19:01:57Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z)
Using Large Language Models for Hyperparameter Optimization [29.395931874196805]
This paper explores the use of foundational large language models (LLMs) in hyper parameter optimization (HPO) Our empirical evaluations on standard benchmarks reveal that within constrained search budgets, LLMs can match or outperform traditional HPO methods.
arXiv Detail & Related papers (2023-12-07T18:46:50Z)
Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML) Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z)
Is One Epoch All You Need For Multi-Fidelity Hyperparameter Optimization? [17.21160278797221]
Multi-fidelity HPO (MF-HPO) leverages intermediate accuracy levels in the learning process and discards low-performing models early on. We compared various representative MF-HPO methods against a simple baseline on classical benchmark data. This baseline achieved similar results to its counterparts, while requiring an order of magnitude less computation.
arXiv Detail & Related papers (2023-07-28T09:14:41Z)
Optimizing Hyperparameters with Conformal Quantile Regression [7.316604052864345]
We propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise. This translates to quicker HPO convergence on empirical benchmarks.
arXiv Detail & Related papers (2023-05-05T15:33:39Z)
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization [50.12374973760274]
We propose and implement a benchmark suite FedHPO-B that incorporates comprehensive FL tasks, enables efficient function evaluations, and eases continuing extensions. We also conduct extensive experiments based on FedHPO-B to benchmark a few HPO methods.
arXiv Detail & Related papers (2022-06-08T15:29:10Z)
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data [66.11139091362078]
We provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics. Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks.
arXiv Detail & Related papers (2022-02-06T20:07:35Z)
BenchML: an extensible pipelining framework for benchmarking representations of materials and molecules at scale [0.0]
We introduce a machine-learning framework for benchmarking representations of chemical systems against datasets of materials and molecules. The guiding principle is to evaluate raw descriptor performance by limiting model complexity to simple regression schemes. The resulting models are intended as baselines that can inform future method development.
arXiv Detail & Related papers (2021-12-04T09:07:16Z)
Revisiting Training Strategies and Generalization Performance in Deep Metric Learning [28.54755295856929]
We revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices. Under consistent comparison, DML objectives show much higher saturation than indicated by literature. Exploiting these insights, we propose a simple, yet effective, training regularization to reliably boost the performance of ranking-based DML models.
arXiv Detail & Related papers (2020-02-19T22:16:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.