Related papers: Learnable Loss Geometries with Mirror Descent for Scalable and Convergent Meta-Learning

Learnable Loss Geometries with Mirror Descent for Scalable and Convergent Meta-Learning

URL: http://arxiv.org/abs/2509.02418v1
Date: Tue, 02 Sep 2025 15:23:21 GMT
Title: Learnable Loss Geometries with Mirror Descent for Scalable and Convergent Meta-Learning
Authors: Yilang Zhang, Bingcong Li, Georgios B. Giannakis,
Abstract summary: meta-learning is a principled approach to learning a new task with limited data records.<n>We present a novel method for preconditioning that speeds up convergence of the per-task training.<n>Tests on few-shot learning datasets demonstrate the superior empirical performance of the novel algorithm.
Score: 41.28925127311434
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Utilizing task-invariant knowledge acquired from related tasks as prior information, meta-learning offers a principled approach to learning a new task with limited data records. Sample-efficient adaptation of this prior information is a major challenge facing meta-learning, and plays an important role because it facilitates training the sought task-specific model with just a few optimization steps. Past works deal with this challenge through preconditioning that speeds up convergence of the per-task training. Though effective in representing locally quadratic loss curvatures, simple linear preconditioning can be hardly potent with complex loss geometries. Instead of relying on a quadratic distance metric, the present contribution copes with complex loss metrics by learning a versatile distance-generating function, which induces a nonlinear mirror map to effectively capture and optimize a wide range of loss geometries. With suitable parameterization, this generating function is effected by an expressive neural network that is provably a valid distance. Analytical results establish convergence of not only the proposed method, but also all meta-learning approaches based on preconditioning. To attain gradient norm less than $\epsilon$, the convergence rate of $\mathcal{O}(\epsilon^{-2})$ is on par with standard gradient-based meta-learning methods. Numerical tests on few-shot learning datasets demonstrate the superior empirical performance of the novel algorithm, as well as its rapid per-task convergence, which markedly reduces the number of adaptation steps, hence also accommodating large-scale meta-learning models.

Related papers

Memory-Reduced Meta-Learning with Guaranteed Convergence [7.306367313570251]
We propose a meta-learning algorithm that can avoid using historical parameters/gradients and significantly reduce memory costs in each iteration.<n> Experimental results on meta-learning benchmarks confirm the efficacy of our proposed algorithm.
arXiv Detail & Related papers (2024-12-16T17:55:55Z)
Meta-Learning with Versatile Loss Geometries for Fast Adaptation Using Mirror Descent [44.56938629818211]
A fundamental challenge in meta-learning is how to quickly "adapt" the extracted prior in order to train a task-specific model. Existing approaches deal with this challenge using a preconditioner that enhances convergence of the per-task training process. The present contribution addresses this limitation by learning a nonlinear mirror map, which induces a versatile distance metric.
arXiv Detail & Related papers (2023-12-20T23:45:06Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z)
Pairwise Learning via Stagewise Training in Proximal Setting [0.0]
We combine adaptive sample size and importance sampling techniques for pairwise learning, with convergence guarantees for nonsmooth convex pairwise loss functions. We demonstrate that sampling opposite instances at each reduces the variance of the gradient, hence accelerating convergence.
arXiv Detail & Related papers (2022-08-08T11:51:01Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous. We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z)
A Sample Complexity Separation between Non-Convex and Convex Meta-Learning [42.51788412283446]
One popular trend in meta-learning is to learn from many tasks a common method that can be used to solve a new task with few samples. This paper shows that it is important to understand the optimization blackbox, specifically at the subspaces of a linear network. analyses of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.
arXiv Detail & Related papers (2020-02-25T20:55:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.