Zeroth-order non-convex learning via hierarchical dual averaging
- URL: http://arxiv.org/abs/2109.05829v1
- Date: Mon, 13 Sep 2021 09:59:06 GMT
- Title: Zeroth-order non-convex learning via hierarchical dual averaging
- Authors: Am\'elie H\'eliou and Matthieu Martin and Panayotis Mertikopoulos and
Thibaud Rahier
- Abstract summary: We propose a hierarchical version of dual dynamic averaging for zeroth-order online non- norm optimization.
We derive tight bounds for both the learners static dynamic regret - i.e., the best policy in hindsight the play.
- Score: 26.023679256204737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a hierarchical version of dual averaging for zeroth-order online
non-convex optimization - i.e., learning processes where, at each stage, the
optimizer is facing an unknown non-convex loss function and only receives the
incurred loss as feedback. The proposed class of policies relies on the
construction of an online model that aggregates loss information as it arrives,
and it consists of two principal components: (a) a regularizer adapted to the
Fisher information metric (as opposed to the metric norm of the ambient space);
and (b) a principled exploration of the problem's state space based on an
adapted hierarchical schedule. This construction enables sharper control of the
model's bias and variance, and allows us to derive tight bounds for both the
learner's static and dynamic regret - i.e., the regret incurred against the
best dynamic policy in hindsight over the horizon of play.
Related papers
- Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures [14.93489065234423]
We present a general variational framework for the training of freeform nonlinearities in layered computational architectures.
The slope constraints allow us to impose properties such as 1-Lipschitz stability, firm non-expansiveness, and monotonicity/invertibility.
We show how to solve the numerically function-optimization problem by representing the nonlinearities in a suitable (nonuniform) B-spline basis.
arXiv Detail & Related papers (2024-08-23T14:39:27Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL)
This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features.
In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Primal-dual Learning for the Model-free Risk-constrained Linear
Quadratic Regulator [0.8629912408966145]
Risk-aware control, though with promise to tackle unexpected events, requires a known exact dynamical model.
We propose a model framework to learn a risk-aware controller with a focus on the linear system.
arXiv Detail & Related papers (2020-11-22T04:40:15Z) - Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a
Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem.
We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions.
We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z) - Online non-convex optimization with imperfect feedback [33.80530308979131]
We consider the problem of online learning with non- losses.
In terms of feedback, we assume that the learner observes - or otherwise constructs - an inexact model for the loss function at each stage.
We propose a mixed-strategy learning policy based on dual averaging.
arXiv Detail & Related papers (2020-10-16T16:53:13Z) - A block coordinate descent optimizer for classification problems
exploiting convexity [0.0]
We introduce a coordinate descent method to deep linear networks for classification tasks that exploits convexity of the cross-entropy loss in the weights of the hidden layer.
By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training.
arXiv Detail & Related papers (2020-06-17T19:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.