Related papers: Meta-Learning for Symbolic Hyperparameter Defaults

Meta-Learning for Symbolic Hyperparameter Defaults

URL: http://arxiv.org/abs/2106.05767v2
Date: Fri, 11 Jun 2021 08:55:58 GMT
Title: Meta-Learning for Symbolic Hyperparameter Defaults
Authors: Pieter Gijsbers, Florian Pfisterer, Jan N. van Rijn, Bernd Bischl and Joaquin Vanschoren
Abstract summary: Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data. We propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but still data-dependent, configuration of the ML algorithm.
Score: 2.928016570228877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but still data-dependent, configuration of the ML algorithm, compared to standard hyperparameter optimization approaches. In the past, symbolic and static default values have usually been obtained as hand-crafted heuristics. We propose an approach of learning such symbolic configurations as formulas of dataset properties from a large set of prior evaluations on multiple datasets by optimizing over a grammar of expressions using an evolutionary algorithm. We evaluate our method on surrogate empirical performance models as well as on real data across 6 ML algorithms on more than 100 datasets and demonstrate that our method indeed finds viable symbolic defaults.

Related papers

Memory-Reduced Meta-Learning with Guaranteed Convergence [7.306367313570251]
We propose a meta-learning algorithm that can avoid using historical parameters/gradients and significantly reduce memory costs in each iteration. Experimental results on meta-learning benchmarks confirm the efficacy of our proposed algorithm.
arXiv Detail & Related papers (2024-12-16T17:55:55Z)
Transfer Learning in $\ell_1$ Regularized Regression: Hyperparameter Selection Strategy based on Sharp Asymptotic Analysis [4.178980693837599]
Transfer learning techniques aim to leverage information from multiple related datasets to enhance prediction quality against a target dataset. Some Lasso-based algorithms have been invented: Trans-Lasso and Pretraining Lasso. We conduct a thorough, precise study of the algorithm in a high-dimensional setting via an analysis using the replica method. Our approach reveals a surprisingly simple behavior of the algorithm: Ignoring one of the two types of information transferred to the fine-tuning stage has little effect on generalization performance.
arXiv Detail & Related papers (2024-09-26T10:20:59Z)
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization. We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z)
AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models. AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z)
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems [0.755972004983746]
We propose a feature extraction method that describes the trajectories of optimization algorithms using simple statistics. We demonstrate that the proposed DynamoRep features capture enough information to identify the problem class on which the optimization algorithm is running.
arXiv Detail & Related papers (2023-06-08T06:57:07Z)
Numerical Optimizations for Weighted Low-rank Estimation on Language Model [73.12941276331316]
Singular value decomposition (SVD) is one of the most popular compression methods that approximates a target matrix with smaller matrices. Standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. We show that our method can perform better than current SOTA methods in neural-based language models.
arXiv Detail & Related papers (2022-11-02T00:58:02Z)
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning [19.5917119072985]
We model contrastive learning into a binary classification problem to predict if a pair is positive or not. The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100.
arXiv Detail & Related papers (2021-11-24T17:51:29Z)
Conservative Objective Models for Effective Offline Model-Based Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures. We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs. COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z)
Data-driven Weight Initialization with Sylvester Solvers [72.11163104763071]
We propose a data-driven scheme to initialize the parameters of a deep neural network. We show that our proposed method is especially effective in few-shot and fine-tuning settings.
arXiv Detail & Related papers (2021-05-02T07:33:16Z)
Meta Learning Black-Box Population-Based Optimizers [0.0]
We propose the use of meta-learning to infer population-based blackbox generalizations. We show that the meta-loss function encourages a learned algorithm to alter its search behavior so that it can easily fit into a new context.
arXiv Detail & Related papers (2021-03-05T08:13:25Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
MementoML: Performance of selected machine learning algorithm configurations on OpenML100 datasets [5.802346990263708]
We present the protocol of generating benchmark data describing the performance of different ML algorithms. Data collected in this way is used to study the factors influencing the algorithm's performance.
arXiv Detail & Related papers (2020-08-30T13:13:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.