Meta-Learning to Improve Pre-Training
- URL: http://arxiv.org/abs/2111.01754v1
- Date: Tue, 2 Nov 2021 17:26:50 GMT
- Title: Meta-Learning to Improve Pre-Training
- Authors: Aniruddh Raghu, Jonathan Lorraine, Simon Kornblith, Matthew McDermott,
David Duvenaud
- Abstract summary: Pre-training (PT) followed by fine-tuning (FT) is an effective method for training neural networks.
PT can incorporate various design choices such as task and data reweighting strategies, augmentation policies, and noise models.
We propose an efficient, gradient-based algorithm to meta-learn PT hyper parameters.
- Score: 38.75981465367226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-training (PT) followed by fine-tuning (FT) is an effective method for
training neural networks, and has led to significant performance improvements
in many domains. PT can incorporate various design choices such as task and
data reweighting strategies, augmentation policies, and noise models, all of
which can significantly impact the quality of representations learned. The
hyperparameters introduced by these strategies therefore must be tuned
appropriately. However, setting the values of these hyperparameters is
challenging. Most existing methods either struggle to scale to high dimensions,
are too slow and memory-intensive, or cannot be directly applied to the
two-stage PT and FT learning process. In this work, we propose an efficient,
gradient-based algorithm to meta-learn PT hyperparameters. We formalize the PT
hyperparameter optimization problem and propose a novel method to obtain PT
hyperparameter gradients by combining implicit differentiation and
backpropagation through unrolled optimization. We demonstrate that our method
improves predictive performance on two real-world domains. First, we optimize
high-dimensional task weighting hyperparameters for multitask pre-training on
protein-protein interaction graphs and improve AUROC by up to 3.9%. Second, we
optimize a data augmentation neural network for self-supervised PT with SimCLR
on electrocardiography data and improve AUROC by up to 1.9%.
Related papers
- Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Deep Ranking Ensembles for Hyperparameter Optimization [9.453554184019108]
We present a novel method that meta-learns neural network surrogates optimized for ranking the configurations' performances while modeling their uncertainty via ensembling.
In a large-scale experimental protocol comprising 12 baselines, 16 HPO search spaces and 86 datasets/tasks, we demonstrate that our method achieves new state-of-the-art results in HPO.
arXiv Detail & Related papers (2023-03-27T13:52:40Z) - CPMLHO:Hyperparameter Tuning via Cutting Plane and Mixed-Level
Optimization [24.39326333982495]
CPMLHO is a new hyperparameter optimization method using cutting plane method and mixed-level objective function.
Compared to existing methods, our method can automatically update the hyperparameters in the training process.
arXiv Detail & Related papers (2022-12-11T07:46:19Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - Scalable One-Pass Optimisation of High-Dimensional Weight-Update
Hyperparameters by Implicit Differentiation [0.0]
We develop an approximate hypergradient-based hyper parameter optimiser.
It requires only one training episode, with no restarts.
We also provide a motivating argument for convergence to the true hypergradient.
arXiv Detail & Related papers (2021-10-20T09:57:57Z) - Efficient Hyperparameter Optimization for Physics-based Character
Animation [1.2183405753834562]
We propose a novel Curriculum-based Multi-Fidelity Bayesian Optimization framework (CMFBO) for efficient hyperparameter optimization of DRL-based character control systems.
We show that our algorithm results in at least 5x efficiency gain comparing to author-released settings in DeepMimic.
arXiv Detail & Related papers (2021-04-26T06:46:36Z) - On the Importance of Hyperparameter Optimization for Model-based
Reinforcement Learning [27.36718899899319]
Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner.
MBRL typically requires significant human expertise before it can be applied to new problems and domains.
arXiv Detail & Related papers (2021-02-26T18:57:47Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.