Related papers: Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

URL: http://arxiv.org/abs/2006.16495v2
Date: Fri, 11 Jun 2021 04:21:42 GMT
Title: Guarantees for Tuning the Step Size using a Learning-to-Learn Approach
Authors: Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge
Abstract summary: We give meta-optimization guarantees for the learning-to-learn approach on a simple problem of tuning the step size for quadratic loss. Although there is a way to design the meta-objective so that the meta-gradient remains bounded, computing the meta-gradient directly using backpropagation leads to numerical issues. We also characterize when it is necessary to compute the meta-objective on a separate validation set to ensure the performance of the learned.
Score: 18.838453594698166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Choosing the right parameters for optimization algorithms is often the key to their success in practice. Solving this problem using a learning-to-learn approach -- using meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates -- was recently shown to be effective. However, the meta-optimization problem is difficult. In particular, the meta-gradient can often explode/vanish, and the learned optimizer may not have good generalization performance if the meta-objective is not chosen carefully. In this paper we give meta-optimization guarantees for the learning-to-learn approach on a simple problem of tuning the step size for quadratic loss. Our results show that the na\"ive objective suffers from meta-gradient explosion/vanishing problem. Although there is a way to design the meta-objective so that the meta-gradient remains polynomially bounded, computing the meta-gradient directly using backpropagation leads to numerical issues. We also characterize when it is necessary to compute the meta-objective on a separate validation set to ensure the generalization performance of the learned optimizer. Finally, we verify our results empirically and show that a similar phenomenon appears even for more complicated learned optimizers parametrized by neural networks.

Related papers

The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong [1.973144426163543]
We highlight methodological issues that frequently occur in the community and should be addressed when evaluating algorithm selection approaches.<n>We show that non-informative features and meta-models can achieve high accuracy, which should not be the case with a well-designed evaluation framework.
arXiv Detail & Related papers (2025-05-12T16:57:45Z)
Memory-Reduced Meta-Learning with Guaranteed Convergence [7.306367313570251]
We propose a meta-learning algorithm that can avoid using historical parameters/gradients and significantly reduce memory costs in each iteration. Experimental results on meta-learning benchmarks confirm the efficacy of our proposed algorithm.
arXiv Detail & Related papers (2024-12-16T17:55:55Z)
Fast Adaptation with Kernel and Gradient based Meta Leaning [4.763682200721131]
We propose two algorithms to improve both the inner and outer loops of Model A Meta Learning (MAML) Our first algorithm redefines the optimization problem in the function space to update the model using closed-form solutions. In the outer loop, the second algorithm adjusts the learning of the meta-learner by assigning weights to the losses from each task of the inner loop.
arXiv Detail & Related papers (2024-11-01T07:05:03Z)
Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators [13.343937277604892]
We study the problem of using meta-learning to deal with uncertainty and heterogeneity in ergodic linear quadratic regulators. We propose an algorithm that omits the estimation of policy Hessian, which applies to tasks of learning a set of heterogeneous but similar linear dynamic systems. We provide a convergence result for the exact gradient descent process by analyzing the boundedness and smoothness of the gradient for the meta-objective.
arXiv Detail & Related papers (2024-05-27T17:26:36Z)
Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z)
Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training. We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields. Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z)
Meta Mirror Descent: Optimiser Learning for Fast Convergence [85.98034682899855]
We take a different perspective starting from mirror descent rather than gradient descent, and meta-learning the corresponding Bregman divergence. Within this paradigm, we formalise a novel meta-learning objective of minimising the regret bound of learning. Unlike many meta-learned optimisers, it also supports convergence and generalisation guarantees and uniquely does so without requiring validation data.
arXiv Detail & Related papers (2022-03-05T11:41:13Z)
Bootstrapped Meta-Learning [48.017607959109924]
We propose an algorithm that tackles a challenging meta-optimisation problem by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities.
arXiv Detail & Related papers (2021-09-09T18:29:05Z)
Conservative Objective Models for Effective Offline Model-Based Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures. We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs. COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z)
Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK) Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework. We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z)
Modeling and Optimization Trade-off in Meta-learning [23.381986209234164]
We introduce and rigorously define the trade-off between accurate modeling and ease in meta-learning. Taking MAML as a representative metalearning algorithm, we theoretically characterize the trade-off for general non risk functions as well as linear regression. We also empirically solve a trade-off for metareinforcement learning benchmarks.
arXiv Detail & Related papers (2020-10-24T15:32:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.