Hyperparameter Optimization with Differentiable Metafeatures
- URL: http://arxiv.org/abs/2102.03776v1
- Date: Sun, 7 Feb 2021 11:06:31 GMT
- Title: Hyperparameter Optimization with Differentiable Metafeatures
- Authors: Hadi S. Jomaa, Lars Schmidt-Thieme, Josif Grabocka
- Abstract summary: We propose a cross dataset surrogate model called Differentiable Metafeature-based Surrogate (DMFBS)
In contrast to existing models, DMFBS i) integrates a differentiable metafeature extractor and ii) is optimized using a novel multi-task loss.
We compare DMFBS against several recent models for HPO on three large meta-datasets and show that it consistently outperforms all of them with an average 10% improvement.
- Score: 5.586191108738563
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metafeatures, or dataset characteristics, have been shown to improve the
performance of hyperparameter optimization (HPO). Conventionally, metafeatures
are precomputed and used to measure the similarity between datasets, leading to
a better initialization of HPO models. In this paper, we propose a cross
dataset surrogate model called Differentiable Metafeature-based Surrogate
(DMFBS), that predicts the hyperparameter response, i.e. validation loss, of a
model trained on the dataset at hand. In contrast to existing models, DMFBS i)
integrates a differentiable metafeature extractor and ii) is optimized using a
novel multi-task loss, linking manifold regularization with a dataset
similarity measure learned via an auxiliary dataset identification meta-task,
effectively enforcing the response approximation for similar datasets to be
similar. We compare DMFBS against several recent models for HPO on three large
meta-datasets and show that it consistently outperforms all of them with an
average 10% improvement. Finally, we provide an extensive ablation study that
examines the different components of our approach.
Related papers
- MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis [0.7751705157998379]
The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP.
Model soups averages multiple fine-tuned models aiming to improve performance on In-Domain (ID) tasks and enhance robustness against Out-of-Distribution (OOD) datasets.
We propose a hierarchical merging approach that involves local and global aggregation of models at various levels.
arXiv Detail & Related papers (2024-03-20T06:48:48Z) - Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How [62.467716468917224]
We propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it.
Our method transfers knowledge about the performance of many pretrained models on a series of datasets.
We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset.
arXiv Detail & Related papers (2023-06-06T16:15:26Z) - Which is the best model for my data? [0.0]
The proposed meta-learning approach relies on machine learning and involves four major steps.
We present a collection of 62 meta-features that address the problem of information cancellation when aggregation measure values involving positive and negative measurements.
We show that our meta-learning approach can correctly predict an optimal model for 91% of the synthetic datasets and for 87% of the real-world datasets.
arXiv Detail & Related papers (2022-10-26T13:15:43Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Conceptually Diverse Base Model Selection for Meta-Learners in Concept
Drifting Data Streams [3.0938904602244355]
We present a novel approach for estimating the conceptual similarity of base models, which is calculated using the Principal Angles (PAs) between their underlying subspaces.
We evaluate these methods against thresholding using common ensemble pruning metrics, namely predictive performance and Mutual Information (MI) in the context of online Transfer Learning (TL)
Our results show that conceptual similarity thresholding has a reduced computational overhead, and yet yields comparable predictive performance to thresholding using predictive performance and MI.
arXiv Detail & Related papers (2021-11-29T13:18:53Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Improved Semantic Role Labeling using Parameterized Neighborhood Memory
Adaptation [22.064890647610348]
We propose a parameterized neighborhood memory adaptive (PNMA) method that uses a parameterized representation of the nearest neighbors of tokens in a memory of activations.
We empirically show that PNMA consistently improves the SRL performance of the base model irrespective of types of word embeddings.
arXiv Detail & Related papers (2020-11-29T22:51:25Z) - Meta-learning framework with applications to zero-shot time-series
forecasting [82.61728230984099]
This work provides positive evidence using a broad meta-learning framework.
residual connections act as a meta-learning adaptation mechanism.
We show that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining.
arXiv Detail & Related papers (2020-02-07T16:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.