Meta-Learning for Symbolic Hyperparameter Defaults
- URL: http://arxiv.org/abs/2106.05767v2
- Date: Fri, 11 Jun 2021 08:55:58 GMT
- Title: Meta-Learning for Symbolic Hyperparameter Defaults
- Authors: Pieter Gijsbers, Florian Pfisterer, Jan N. van Rijn, Bernd Bischl and
Joaquin Vanschoren
- Abstract summary: Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data.
We propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset.
This enables a much faster, but still data-dependent, configuration of the ML algorithm.
- Score: 2.928016570228877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperparameter optimization in machine learning (ML) deals with the problem
of empirically learning an optimal algorithm configuration from data, usually
formulated as a black-box optimization problem. In this work, we propose a
zero-shot method to meta-learn symbolic default hyperparameter configurations
that are expressed in terms of the properties of the dataset. This enables a
much faster, but still data-dependent, configuration of the ML algorithm,
compared to standard hyperparameter optimization approaches. In the past,
symbolic and static default values have usually been obtained as hand-crafted
heuristics. We propose an approach of learning such symbolic configurations as
formulas of dataset properties from a large set of prior evaluations on
multiple datasets by optimizing over a grammar of expressions using an
evolutionary algorithm. We evaluate our method on surrogate empirical
performance models as well as on real data across 6 ML algorithms on more than
100 datasets and demonstrate that our method indeed finds viable symbolic
defaults.
Related papers
- Transfer Learning in $\ell_1$ Regularized Regression: Hyperparameter
Selection Strategy based on Sharp Asymptotic Analysis [4.178980693837599]
Transfer learning techniques aim to leverage information from multiple related datasets to enhance prediction quality against a target dataset.
Some Lasso-based algorithms have been invented: Trans-Lasso and Pretraining Lasso.
We conduct a thorough, precise study of the algorithm in a high-dimensional setting via an analysis using the replica method.
Our approach reveals a surprisingly simple behavior of the algorithm: Ignoring one of the two types of information transferred to the fine-tuning stage has little effect on generalization performance.
arXiv Detail & Related papers (2024-09-26T10:20:59Z) - Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization.
We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - DynamoRep: Trajectory-Based Population Dynamics for Classification of
Black-box Optimization Problems [0.755972004983746]
We propose a feature extraction method that describes the trajectories of optimization algorithms using simple statistics.
We demonstrate that the proposed DynamoRep features capture enough information to identify the problem class on which the optimization algorithm is running.
arXiv Detail & Related papers (2023-06-08T06:57:07Z) - Numerical Optimizations for Weighted Low-rank Estimation on Language
Model [73.12941276331316]
Singular value decomposition (SVD) is one of the most popular compression methods that approximates a target matrix with smaller matrices.
Standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption.
We show that our method can perform better than current SOTA methods in neural-based language models.
arXiv Detail & Related papers (2022-11-02T00:58:02Z) - MIO : Mutual Information Optimization using Self-Supervised Binary
Contrastive Learning [19.5917119072985]
We model contrastive learning into a binary classification problem to predict if a pair is positive or not.
The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100.
arXiv Detail & Related papers (2021-11-24T17:51:29Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Data-driven Weight Initialization with Sylvester Solvers [72.11163104763071]
We propose a data-driven scheme to initialize the parameters of a deep neural network.
We show that our proposed method is especially effective in few-shot and fine-tuning settings.
arXiv Detail & Related papers (2021-05-02T07:33:16Z) - Meta Learning Black-Box Population-Based Optimizers [0.0]
We propose the use of meta-learning to infer population-based blackbox generalizations.
We show that the meta-loss function encourages a learned algorithm to alter its search behavior so that it can easily fit into a new context.
arXiv Detail & Related papers (2021-03-05T08:13:25Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - MementoML: Performance of selected machine learning algorithm
configurations on OpenML100 datasets [5.802346990263708]
We present the protocol of generating benchmark data describing the performance of different ML algorithms.
Data collected in this way is used to study the factors influencing the algorithm's performance.
arXiv Detail & Related papers (2020-08-30T13:13:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.